Planet LOD2
About
Accounts
- My stream
- My TV
- My friends
Click here to check if anything new just came in.
June 11 2015
- #
- React
June 10 2015
In Hoc Signo Vinces (part 21 of n): Running TPC-H on Virtuoso Elastic Cluster on Amazon EC2
We have made an Amazon EC2 deployment of Virtuoso 7 Commercial Edition, configured to use the Elastic Cluster Module with TPC-H preconfigured, similar to the recently published OpenLink Virtuoso Benchmark AMI running the Open Source Edition. The details of the new Elastic Cluster AMI and steps to use it will be published in a forthcoming post. Here we will simply look at results of running TPC-H 100G scale on two machines, and 1000G scale on four machines. This shows how Virtuoso provides great performance on a cloud platform. The extremely fast bulk load — 33 minutes for a terabyte! — means that you can get straight to work even with on-demand infrastructure.
In the following, the Amazon instance type is R3.8xlarge, each with dual Xeon E5-2670 v2, 244G RAM, and 2 x 300G SSD. The image is made from the Amazon Linux with built-in network optimization. We first tried a RedHat image without network optimization and had considerable trouble with the interconnect. Using network-optimized Amazon Linux images inside a virtual private cloud has resolved all these problems.
The network optimized 10GE interconnect at Amazon offers throughput close to the QDR InfiniBand running TCP-IP; thus the Amazon platform is suitable for running cluster databases. The execution that we have seen is not seriously network bound.
100G on 2 machines, with a total of 32 cores, 64 threads, 488 GB RAM, 4 x 300 GB SSD
Load time: 3m 52s Run Power Throughput Composite1
523,554.3
590,692.6
556,111.2
2
565,353.3
642,503.0
602,694.9
1000G on 4 machines, with a total of 64 cores, 128 threads, 976 GB RAM, 8 x 300 GB SSD
Load time: 32m 47s Run Power Throughput Composite1
592,013.9
754,107.6
668,163.3
2
896,564.1
828,265.4
861,738.4
3
883,736.9
829,609.0
856,245.3
For the larger scale we did 3 sets of power + throughput tests to measure consistency of performance. By the TPC-H rules, the worst (first) score should be reported. Even after bulk load, this is markedly less than the next power score due to working set effects. This is seen to a lesser degree with the first throughput score also.
The numerical quantities summaries are available in a report.zip file, or individually --
-
report-100-1.txt
-
report-100-2.txt
-
report-1000-1.txt
-
report-1000-2.txt
-
report-1000-3.txt
Subsequent posts will explain how to deploy Virtuoso Elastic Clusters on AWS.
In Hoc Signo Vinces (TPC-H) Series
- In Hoc Signo Vinces (part 1): Virtuoso meets TPC-H
- In Hoc Signo Vinces (part 2): TPC-H Schema Choices
- In Hoc Signo Vinces (part 3): Benchmark Configuration Settings
- In Hoc Signo Vinces (part 4): Bulk Load and Refresh
- In Hoc Signo Vinces (part 5): The Return of SQL Federation
- In Hoc Signo Vinces (part 6): TPC-H Q1 and Q3: An Introduction to Query Plans
- In Hoc Signo Vinces (part 7): TPC-H Q13: The Good and the Bad Plans
- In Hoc Signo Vinces (part 8): TPC-H: INs, Expressions, ORs
- In Hoc Signo Vinces (part 9): TPC-H Q18, Ordered Aggregation, and Top K
- In Hoc Signo Vinces (part 10): TPC-H Q9, Q17, Q20 - Predicate Games
- In Hoc Signo Vinces (part 11): TPC-H Q2, Q10 - Late Projection
- In Hoc Signo Vinces (part 12): TPC-H: Result Preview
- In Hoc Signo Vinces (part 13): Virtuoso TPC-H Kit Now on V7 Fast Track
- In Hoc Signo Vinces (part 14): Virtuoso TPC-H Implementation Analysis
- In Hoc Signo Vinces (part 15): TPC-H and the Science of Hash
- In Hoc Signo Vinces (part 16): Introduction to Scale-Out
- In Hoc Signo Vinces (part 17): 100G and 300G Runs on Dual Xeon E5 2650v2
- In Hoc Signo Vinces (part 18): Cluster Dynamics
- In Hoc Signo Vinces (part 19): Scalability, 1000G, and 3000G
- In Hoc Signo Vinces (part 20): 100G and 1000G With Cluster; When is Cluster Worthwhile; Effects of I/O
- In Hoc Signo Vinces (part 21): Running TPC-H on Virtuoso Cluster on Amazon EC2 (this post)
- #
- React
June 09 2015
Introducing the OpenLink Virtuoso Benchmarks AMI on Amazon EC2
The OpenLink Virtuoso Benchmarks AMI is an Amazon EC2 machine image with the latest Virtuoso open source technology preconfigured to run —
-
TPC-H , the classic of SQL data warehousing
-
LDBC SNB, the new Social Network Benchmark from the Linked Data Benchmark Council
-
LDBC SPB, the RDF/SPARQL Semantic Publishing Benchmark from LDBC
This package is ideal for technology evaluators and developers interested in getting the most performance out of Virtuoso. This is also an all-in-one solution to any questions about reproducing claimed benchmark results. All necessary tools for building and running are included; thus any developer can use this model installation as a starting point. The benchmark drivers are preconfigured with appropriate settings, and benchmark qualification tests can be run with a single command.
The Benchmarks AMI includes a precompiled, preconfigured checkout of the v7fasttrack github repository, checkouts of the github repositories of the benchmarks, and a number of running directories with all configuration files preset and optimized. The image is intended to be instantiated on a R3.8xlarge Amazon instance with 244G RAM, dual Xeon E5-2670 v2, and 600G SSD.
Benchmark datasets and preloaded database files can be downloaded from S3 when large, and generated as needed on the instance when small. As an alternative, the instance is also set up to do all phases of data generation and database bulk load.
The following benchmark setups are included:
- TPC-H 100G
- TPC-H 300G
- LDBC SNB Validation
- LDBC SNB Interactive 100G
- LDBC SNB Interactive 300G (SF3)
- LDBC SPB Validation
- LDBC SPB Basic 256 Mtriples (SF5)
- LDBC SPB Basic 1 Gtriple
The AMI will be expanded as new benchmarks are introduced, for example, the LDBC Social Network Business Intelligence or Graph Analytics.
To get started:
-
Instantiate machine image ami-5304ef38 (AMI ID is subject to change; you should be able to find the latest by searching for "OpenLink Virtuoso Benchmarks" in "Community AMIs") with a R3.8xlarge instance.
-
Connect via
ssh
. -
See the README (also found in the
ec2-user
's home directory) for full instructions on getting up and running.
- #
- React
SNB Interactive, Part 3: Choke Points and Initial Run on Virtuoso
In this post we will look at running the LDBC SNB on Virtuoso.
First, let's recap what the benchmark is about:
- fairly frequent short updates, with no update contention worth mentioning
- short random lookups
- medium complex queries centered around a person's social environment
The updates exist so as to invalidate strategies that rely too heavily on precomputation. The short lookups exist for the sake of realism; after all, an online social application does lookups for the most part. The medium complex queries are to challenge the DBMS.
The DBMS challenges have to do firstly with query optimization, and secondly with execution with a lot of non-local random access patterns. Query optimization is not a requirement, per se, since imperative implementations are allowed, but we will see that these are no more free of the laws of nature than the declarative ones.
The workload is arbitrarily parallel, so intra-query parallelization is not particularly useful, if also not harmful. There are latency constraints on operations which strongly encourage implementations to stay within a predictable time envelope regardless of specific query parameters. The parameters are a combination of person and date range, and sometimes tags or countries. The hardest queries have the potential to access all content created by people within 2 steps of a central person, so possibly thousands of people, times 2000 posts per person, times up to 4 tags per post. We are talking in the millions of key lookups, aiming for sub-second single-threaded execution.
The test system is the same as used in the TPC-H series: dual Xeon E5-2630, 2x6 cores x 2 threads, 2.3GHz, 192 GB RAM. The software is the feature/analytics branch of v7fasttrack, available from www.github.com.
The dataset is the SNB 300G set, with:
1,136,127
persons
125,249,604
knows
edges847,886,644
posts
, including replies1,145,893,841
tags
of posts or replies1,140,226,235
likes
of posts or replies
As an initial step, we run the benchmark as fast as it will go. We use 32 threads on the driver side for 24 hardware threads.
Below are the numerical quantities for a 400K operation run after 150K operations worth of warmup.
Duration:10:41.251
Throughput:623.71 (op/s)
The statistics that matter are detailed below, with operations ranked in order of descending client-side wait-time. All times are in milliseconds.
% of total total_wait name count mean min max20 %
4,231,130
LdbcQuery5
656
6,449.89
245
10,311
11 %
2,272,954
LdbcQuery8
18,354
123.84
14
2,240
10 %
2,200,718
LdbcQuery3
388
5,671.95
468
17,368
7.3 %
1,561,382
LdbcQuery14
1,124
1,389.13
4
5,724
6.7 %
1,441,575
LdbcQuery12
1,252
1,151.42
15
3,273
6.5 %
1,396,932
LdbcQuery10
1,252
1,115.76
13
4,743
5 %
1,064,457
LdbcShortQuery3PersonFriends
46,285
22.9979
0
2,287
4.9 %
1,047,536
LdbcShortQuery2PersonPosts
46,285
22.6323
0
2,156
4.1 %
885,102
LdbcQuery6
1,721
514.295
8
5,227
3.3 %
707,901
LdbcQuery1
2,117
334.389
28
3,467
2.4 %
521,738
LdbcQuery4
1,530
341.005
49
2,774
2.1 %
440,197
LdbcShortQuery4MessageContent
46,302
9.50708
0
2,015
1.9 %
407,450
LdbcUpdate5AddForumMembership
14,338
28.4175
0
2,008
1.9 %
405,243
LdbcShortQuery7MessageReplies
46,302
8.75217
0
2,112
1.9 %
404,002
LdbcShortQuery6MessageForum
46,302
8.72537
0
1,968
1.8 %
387,044
LdbcUpdate3AddCommentLike
12,659
30.5746
0
2,060
1.7 %
361,290
LdbcShortQuery1PersonProfile
46,285
7.80577
0
2,015
1.6 %
334,409
LdbcShortQuery5MessageCreator
46,302
7.22234
0
2,055
1 %
220,740
LdbcQuery2
1,488
148.347
2
2,504
0.96 %
205,910
LdbcQuery7
1,721
119.646
11
2,295
0.93 %
198,971
LdbcUpdate2AddPostLike
5,974
33.3062
0
1,987
0.88 %
189,871
LdbcQuery11
2,294
82.7685
4
2,219
0.85 %
182,964
LdbcQuery13
2,898
63.1346
1
2,201
0.74 %
158,188
LdbcQuery9
78
2,028.05
1,108
4,183
0.67 %
143,457
LdbcUpdate7AddComment
3,986
35.9902
1
1,912
0.26 %
54,947
LdbcUpdate8AddFriendship
571
96.2294
1
988
0.2 %
43,451
LdbcUpdate6AddPost
1,386
31.3499
1
2,060
0.0086%
1,848
LdbcUpdate4AddForum
103
17.9417
1
65
0.0002%
44
LdbcUpdate1AddPerson
2
22
10
34
At this point we have in-depth knowledge of the choke points the benchmark stresses, and we can give a first assessment of whether the design meets its objectives for setting an agenda for the coming years of graph database development.
The implementation is well optimized in general but still has maybe 30% room for improvement. We note that this is based on a compressed column store. One could think that alternative data representations, like in-memory graphs of structs and pointers between them, are better for the task. This is not necessarily so; at the least, a compressed column store is much more space efficient. Space efficiency is the root of cost efficiency, since as soon as the working set is not in memory, a random access workload is badly hit.
The set of choke points (technical challenges) actually revealed by the benchmark is so far as follows:
-
Cardinality estimation under heavy data skew — Many queries take a
tag
or acountry
as a parameter. The cardinalities associated withtags
vary from 29Mposts
for the most common to 1 for the least common. Q6 has a commontag
(in top few hundred) half the time and a ran