| New High I/O EC2 Instance Type - hi1.4xlarge - 2 TB of SSD-Backed Storage (aws.typepad.com) | | 192 points by jeffbarr 53 days ago | comments |
| | bravura 53 days ago | link
What are good use-cases for on-demand high I/O servers?At $3.10/hr, these instances work out to $2k/mo.
There are probably many more cost-effective options if you want a 2TB SSD server. Since the benefit of using EC2 is that you can provision instances elastically, what are the sorts of scenarios in which one needs to provision high I/O servers elastically? [edit: A few minutes of Googling, and I can't find any dedicated servers with 2 TB of SSD.] ----- |
| | | mbreese 53 days ago | link
I do genome mapping where our indexes won't entirely fit in memory. It would be very handy to be able to spin up a few of these instances, load the indexes from an EBS volume onto the local SSDs, then run for a couple of hours or so. This is a very I/O intensive job that we need to run about once a week, but then the rest of the time could be idle.SSDs would make our jobs run significantly faster. So much so that we've toyed with the idea of adding SSDs to our in-house cluster, but couldn't quite justify the costs. This might actually shift the cost savings to get our lab to migrate to EC2 as opposed to our in-house or university cluster. ----- |
| | | Gmo 53 days ago | link
We are also facing the same kind of problems in my company, regarding genome assembly and mapping.That's definitely something we will look into :) ----- |
| | | revorad 53 days ago | link
I'm working on a data visualisation app, which is getting a lot of interest from biologists and bioinformaticians. I'd like to learn a bit more about your work. Can I email you somewhere? Or please drop me an email at hrishi@prettygraph.com. Thanks!----- |
| | | zurn 52 days ago | link
How long would it take to read the 2 TB from EBS?----- |
| | | ybother 52 days ago | link
based on this, it shouldn't take more than half a day under the worst circumstances (single EBS drive with crappy performance), and if you Raid together enough drives, you can do it in about an hour. Correct me if I'm wrong, but you pay for EBS by size, not physical disks, so the more you can split up your data in blocks, the more performance you're going to get.stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-a... ----- |
| | | cperciva 52 days ago | link
2 TB in one hour is about 4.9 Gbps. Cluster Computes have 10 GbE internally, but I'd be surprised if they have that all the way out to EBS.----- |
| | | lonnyk 53 days ago | link
Any chance we could get an example of the data set and the calculation that needs to be done?----- |
| | | mbreese 52 days ago | link
You can get some of the data from the 1000 genomes project directly from Amazon, so you don't need to pay to download it. There's about 200TB of data there (so far).aws.amazon.com/1000genomes/ What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform. en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transfo... And that's all just to get the data that you can then do something else with (gene expression, variation modeling, etc...). ----- |
| | | Gmo 53 days ago | link
Well, the raw output of a typical so-called "next gen sequencing" (which are actually very current gen) machine is around 1TB (at least, the ones we used here).This is raw file though, so once processed (but not yet analyzed) I believe we have sizes around 50 to 100GB (but that's not really what I work on so don't quote me fully on this). The next steps vary on what you want to do exactly, but it usually involves alignment of base pairs (basically, trying to tie together by their ends sequences of DNA but seeing if they "fit"). ----- |
| | | Gmo 53 days ago | link
I said by their end but it can also be the full sequence, depending on the job----- |
| | | micro_cam 52 days ago | link
Essentially you sequence tons of short bits of dna and then either fit them together (assemble) or fit them to a reference (align). You can find example data sets in the Short Read Archive:
www.ncbi.nlm.nih.gov/sra/Cloudburst (a hadoop based aligner) has a good description of an algorithm:
sourceforge.net/apps/mediawiki/cloudburst-bio/index.p...
Though they can get much more sophisticated and there are a number of open and closed source implementations...I only link this one because of the quality of the figure. The data sets we work with in my group can be up 400gb's of compressed text for the reads from a single individual. Another example from biology with a similar computational profile would be searching through a hugh number of mass spectrometer outputs to identify the components in a new sample. ----- |
| | | saurik 53 days ago | link
That is only one of the benefits of EC2. If you are not using elasticity, then you have to factor in the reserved instance pricing, which drops the prices down by 71% (as in, to 29% of the list price; and I mean even if you include the up-front cost: that's overall savings). You, like most people who comment on the price of EC2, do not seem to be taking this into consideration. :(----- |
| | | joe_bloggs 53 days ago | link
What are some of the other benefits, other than elasticity or on-demand ness, and not having to spend a huge amount upfront?----- |
| | | saurik 53 days ago | link
One example, which stems from "on-demand ness" (which you added: I was only responding to "elasticity"), is that you can do "test runs" of migrations and deployments without even thinking about it: you can rent, for just an hour, a setup identical to your existing one, often based on a consistent and atomic snapshot of your production machine, so you can try something "likely correct but possibly horribly wrong"; then, if it works, rather than replicating the change on your "real" machine, you can just cut over to the new one and shut down the old.Way too many people seem to believe that the only benefit of "on-demand" is "elasticity", and then make bogus arguments here that "if you can plan your traffic you shouldn't be using EC2": EC2 is cheaper than people like to claim (and is in fact quite price competitive) and your ability to turn on/off machines on a whim changes the way you look at hardware so drastically that, in all honesty, it makes traditional ways of dealing with hardware seem draconian and only worth putting up with if you are dealing with some weird corner case or have horribly special requirements. ----- |
| | | moe 52 days ago | link
EC2 is cheaper than people like to claimHow often do we have to repeat this argument here on HN. When running 24/7 then EC2 instances are 2-3x more expensive than the cheapest equivalent rent-server options and orders of magnitude more expensive than the physical hardware if you buy it. The numbers have been recited countless times, I'm not digging them up yet again. So, no, EC2 is not cost effective for steady loads at the mid-range. EC2 shines at the very low and the very high end and in specific workloads, i.e. it shines when the benefits can be quantified to an amount greater than the price difference. ----- |
| | | saurik 52 days ago | link
I would be very interested in knowing what factors you are trading off for "equivalent" other than "on-demand". Many of my friends use co-location services for their businesses, and most of them purchase only on price, and their servers honestly suck: they have high latency, they are unstable, they don't have remote serial console access... they are living in a ghetto that burns tons of their time into "becoming server people".If you find a company that has reasonable support, reliable servers, good datacenters, and the minimal features required to debug issues remotely, then you are looking at prices fairly equivalent to those offered by EC2 heavy-utilization reserved instances (and are going to end up with a similar contract length anyway). If this somehow doesn't work out: call Amazon AWS's sales division and see if you are compelling enough for them to negotiate with (they totally have a sales division, and they really do "want your business"). Regardless, your choice of quote is really bothersome: "people like to claim" that EC2 is as expensive as their on-demand list prices, and that's a fact clearly demonstrable by the person I'm responding to (who is quite clearly and obviously claiming EC2 is more expensive than it really is) and one that is not defensible as the price you should be looking at is the heavy-utilization reserved pricing; if you'd like to respond to my comment "and is in fact quite price competitive" then you should quote that and adjust your argument appropriately. Honestly, the history of HN is not much better (as I scour around trying to find the "numbers" you claim "have been recited countless times"). It is actually difficult to find people who don't claim that Amazon EC2 is more expensive than it is; I'm almost wondering if you and I are living on different versions of the site... "EC2 is about 10-20 times more expensive than dedicated hosting. Even if reserved instances save us 22% over 3 years, it still doesn't even come close." -- cmer ^ No, EC2 reserved instances save you 71% over 3 years. "It costs $576/mo to run an extra large EC2 instance fulltime" -- stephenjudkins ^ No, even two years ago (before "heavy utilization reserved instances") you could drop this price by 66% to $195.84/mo. "With EC2 prices at about $0.10 per hour, I can't imagine ever using a service with such a high premium." -- apinstein ^ Obviously: no, but the fact that this person is angry about the price of a small instance at $72/mo is quite telling; he isn't willing to go lower than $20/mo. I found a price comparison by vladd from earlier this year, comparing a high-end VPS to EC2's largest offering, coming up with a nearly 10x difference, but the server is entirely useless: it is a consumer-level product running non-ECC RAM. Later comments claim the same hosting company has "competitively priced servers with ECC ram". A couple months ago I found a thread that linked to a fairly detailed argument[1] stating that EC2 instances are 2-3x more expensive than a VPS. However, this person again is performing a comparison with non-ECC hardware. What damns this comparison, however, is that he is not taking advantage of 3-year reserved instances for a long-term high-end use case: his numbers seem to be based on 49% off, when he can easily get 71% off, nearly a 2x difference. <- Again, EC2 is cheaper than people like to claim. [1] codemonkeyism.com/dark-side-virtualized-servers-cloud... Seriously: I can't find anyone who is actually doing legitimate comparisons of Amazon's offerings. People either compare EC2 to "I spent a week of time negotiating a deal to take over a bunch of hardware from a failing company down the street" (which, for the record, will also give you a great deal on chairs and office furniture: comparing the cloud to a fire sale is inane), assume "a server is a server is a server" and find "the cheapest" option (which seems to always have unreliable RAM), or (frankly: "and") fail to take into account Amazon's reserved instance discounts. That said, Hacker News has a really horrible search system, and I'm trying to find something kind of esoteric (as I want to search for a dollar sign, and thereby have to use proxies such as "expensive" and "cost"). I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such? ----- |
| | | moe 52 days ago | link
it is a consumer-level product running non-ECC RAMSorry to break that for you but EC2 instances are in all likelihood not running on ECC-Ram either[1]. If they had ECC-Ram then Amazon would probably prominently advertise that or at least respond when they are directly asked. If you can find a link to prove the opposite then I'll take that back. I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such? You have probably already seen any of the blog-posts I could cite here, so I'll instead just try to wrap your two claims up: 1. You claim that dedicated servers are more labor intensive (setup, hardware failures) and require more staff. This is not my experience at all. In fact the complexity and idiosyncrasies of the AWS platform are much harder to abstract in the beginning, and no less labor intensive in the long term. You're just trading one set of problems (hardware issues) for a different one (cloud issues). What you may save on the hardware management front you have to spend on adapting your application for a cloud-environment. 2. You claim that equivalent hardware to an EC2 instance (with comparable performance, good support, network, etc.) would be roughly the same price as an EC2 instance. Sorry but that is laughable, when have you last time benchmarked an EC2 instance? Even a cheap rented dedicated server (hetzner, leasweb, ovh) will normally give you twice the bang for buck on every key metric (I/O, Ram, CPU). And this quickly raises to beyond an order of magnitude when you start comparing EBS to a local array or a 256G Ram box to 256G Ram in EC2-instances. Where redundancy is a concern you can usually quite literally buy two of each and still be cheaper than EC2. I'll say what I always say: EC2 does have its place. However for deployments in the range of 10-~50 servers you will in pretty much all cases save a lot of money by sticking with dedicated servers for the base-load. That is unless your app needs the cloud-flexibility, of course (most apps don't). What makes you believe this flexibility would come for free anyways? As all things it comes with a price-tag, and actually quite a hefty one in this case. [1] https://forums.aws.amazon.com/message.jspa?messageID=203167 ----- |
| | | wmf 52 days ago | link
They don't advertise it because it goes without saying that servers have ECC. EC2 uses Xeons and Opterons which only support ECC. It should only be a few percent more expensive, which is nothing when you consider the premium Amazon charges (which is something I definitely agree with you about).----- |
| | | moe 52 days ago | link
because it goes without sayingI've been dealing for long enough with hosters and hardware to know that nothing goes without saying. Xeons and Opterons which only support ECC Have you actually checked the CPU models they use?
All I know is that amazon uses a range of different CPUs, and some Xeon/Opteron models do accept non-ECC Ram. only be a few percent more expensive In the past ECC DIMMs used to be significantly more expensive. Either way, as said, I don't know whether they're using ECC Ram. I agree it should go without saying, but I don't share your optimism that it actually does. I also wonder why they explicitly mention it for their GPU-instances when it goes without saying otherwise. ----- |
| | | spartango 52 days ago | link
FYI, EC2's machines do have ECC ram. They don't advertise it, though.----- |
| | | moe 52 days ago | link
Can you cite a source please?A little more than an anonymous one-liner in a forum would really help my confidence... ----- |
| | | dpe82 50 days ago | link
Phoronix' benchmarking test suite has been able to detect underlying hardware: www.phoronix.com/scan.php?page=article&item=amazo...----- |
| | | spartango 52 days ago | link
No source aside from personal experience working with them, sorry. They avoid publicizing anything about the hardware/infrastructure if possible, partly so that they can change it without customer awareness and partly because they have secret sauce in places (no, ECC isn't secret sauce).----- |
| | | moe 52 days ago | link
Okay, I guess I'll take that as another datapoint, although honestly (no offense) I won't be basing decisions on it. ;)----- |
| | | bluesnowmonkey 52 days ago | link
> orders of magnitude more expensive than the physical hardwareOrders of magnitude? So, like 100x or 1000x more expensive? Really? ----- |
| | | adrianpike 52 days ago | link
For us it was more like a 10x, but a few things went into that:
- We found screamin' deals on hardware by snatching it up when it was available, not when we needed it.
- We were at a fairly cheap colo, and haggled hard to get the cheapest rack possible. We went on a tour, noticed they had _tons_ of empty racks, and used that as some leverage.
- We didn't add any additional ops overhead by having everyone responsible for ops.We were in the 5k/month ballpark for EC2, and cut it to under $600 with a few grand outlay for hardware spread over the course of a quarter. That said, all of my current projects are on EC2 for the provisioning flexibility, and because I hate having to drive down to a datacenter at 4AM to swap a drive. ----- |
| | | saurik 52 days ago | link
Please tell me that you were including the cost of you driving to the datacenter at 4AM to swap a drive in the cost of the hardware in your price comparisons, as if you are just talking about the cost of the raw hardware and are not including the opportunity cost of all this time and energy spent haggling and performing maintenance, then this is simply a dishonest comparison: you could easily have been spending that time doing just about anything else, from working on new features for your product to improving your sales/investment pitch to simply sleeping (which will improve the quality of all of your work the next day). I'm also curious what your replacement plan is: are you intending to do this again next year, or are you intending to wait until all of your hardware starts failing and the operational overhead starts becoming painful? Finally, "having everyone responsible for ops" might mean that you didn't have to add a new explicit hire, but you can't claim that that isn't overhead: that is now state that everyone has to keep in their head and is a liability that could cause anyone to randomly get interrupted; it might even be cheaper in the long term to hire a new, more dedicated person than to reuse existing people.----- |
| | | moe 52 days ago | link
Yes, 10x is common. 100x is a little contrived but possible when you spec out enough Ram in EC2 instances (>2T), then compare to a physical box over 3yrs.----- |
| | | maayank 53 days ago | link
"if it works, rather than replicating the change on your "real" machine, you can just cut over to the new one and shut down the old."Interesting! Any other hacks enabled by EC2 and the like that make life much easier than real dedicated hardware? ----- |
| | | cwp 52 days ago | link
I do this too, not for risky migrations, but for daily updates. The app relies on a data service that's normally read-only, but gets fresh data daily. When everything was running on the bare metal, we had to schedule the updates for the middle of the night and carefully migrate the data in-place to avoid interrupting service. It would take 8 to 12 hours.Since we moved to EC2, updating is simpler. The service runs on a micro instance. We launch a large instance to do all the CPU- and IO-intensive processing that prepares the new dataset, then launch a new micro instance, upload the dataset, run a few smoke tests, and if all is well, cut over to it. Because we're doing it off line, we were able to optimize the data processing for speed rather than low resource usage, and cut the runtime down to 45 minutes. One thing that's often missed in discussions of IaaS versus bare metal is that the elasticity of a particular application can be affected by its design. When we were running on dedicated machines, we smoothed out the load to avoid idle hardware, but after moving to EC2 we concentrated it into spikes to get maximum productivity from running instances. In our case, spiky load is better from a business point of view, because serving data that's 1-25 hours old is better than data that's 8-32 hours old. ----- |
| | | jaylevitt 53 days ago | link
There are also the network benefits (pun intended). If the rest of your app does benefit from elasticity, you've had to choose betwen:1. Keeping your app on EC2 and working around the lack of high I/O options
2. Keeping your app somewhere else and working around the lack of EC2-style elasticity In TFA, Netflix had chosen #1, and they used to run an extra memcached layer + I/O on 48 instances. They were able to bring this down to 15 I/O instances with no intervening cache, and lower overall latency. That said, I'd guess the on-demand hi1.xlarge won't get a lot of usage; I imagine they offer it just for orthogonality's sake (all other instances are available both on-demand and reserved), plus the ability to try before you buy. What's really exciting is that Amazon clearly recognizes their lack of good I/O solutions. Maybe we'll see a whole range of options stem out of this... one can hope. ----- |
| | | XERQ 53 days ago | link
[edit: A few minutes of Googling, and I can't find any dedicated servers with 2 TB of SSD.]I am the founder of SSD Nodes, Inc. (www.ssdnodes.com) and we offer lower-sized SSD-backed storage in addition to custom cloud and dedicated plans that range from 1-12TB of local SSD storage, at comparable performance. [/plug] ----- |
| | | corin_ 53 days ago | link
Given your "1-12tb" range doesn't list prices online, can you tell us whether your prices are comparable? At $1249/month for 2x 200GB SSD, it seems unlikely, but maybe I'm wrong?----- |
| | | XERQ 53 days ago | link
Typically clients requiring that much SSD-backed storage have performance targets and a very specific workload, so this affects options along with pricing. With that said, ballpark for what Amazon is advertising is comparable to our pricing, except with us you are _guaranteed_ the resources whereas they are using a multi-tenant environment (you're not using the only instance on their host and your I/O is influenced by everyone else using that host).EDIT: Downvoted? Please offer your point of view. EDIT2: Sample pricing offered in comment below. ----- |
| | | corin_ 53 days ago | link
I haven't downvoted (and in fact couldn't if I wanted to since you replied to me), but personally my issue with your comment is that you're being as vague as your website's prices. Would be more interesting and relevant if you actually gave a price for a box comparable to the AWS specs.----- |
| | | XERQ 53 days ago | link
I apologize, here's a sample pricing configuration:8 x 3.4GHz E3-1270 11
|
|
|