Planet LOD2

About

planet.lod2.eu is the feed merge of all relevant blogs and updates from the EU FP7 project LOD2 -- Creating Knowledge out of Interlinked Data (Homepage).

Accounts

Newer posts are loading.

You are at the newest post.
Click here to check if anything new just came in.

June 11 2015

08:51

→ blog.exalead.com

And what about the workers?

#
React

June 10 2015

16:03

In Hoc Signo Vinces (part 21 of n): Running TPC-H on Virtuoso Elastic Cluster on Amazon EC2

We have made an Amazon EC2 deployment of Virtuoso 7 Commercial Edition, configured to use the Elastic Cluster Module with TPC-H preconfigured, similar to the recently published OpenLink Virtuoso Benchmark AMI running the Open Source Edition. The details of the new Elastic Cluster AMI and steps to use it will be published in a forthcoming post. Here we will simply look at results of running TPC-H 100G scale on two machines, and 1000G scale on four machines. This shows how Virtuoso provides great performance on a cloud platform. The extremely fast bulk load — 33 minutes for a terabyte! — means that you can get straight to work even with on-demand infrastructure.

In the following, the Amazon instance type is R3.8xlarge, each with dual Xeon E5-2670 v2, 244G RAM, and 2 x 300G SSD. The image is made from the Amazon Linux with built-in network optimization. We first tried a RedHat image without network optimization and had considerable trouble with the interconnect. Using network-optimized Amazon Linux images inside a virtual private cloud has resolved all these problems.

The network optimized 10GE interconnect at Amazon offers throughput close to the QDR InfiniBand running TCP-IP; thus the Amazon platform is suitable for running cluster databases. The execution that we have seen is not seriously network bound.

100G on 2 machines, with a total of 32 cores, 64 threads, 488 GB RAM, 4 x 300 GB SSD

Load time: 3m 52s Run Power Throughput Composite 1 523,554.3 590,692.6 556,111.2 2 565,353.3 642,503.0 602,694.9

1000G on 4 machines, with a total of 64 cores, 128 threads, 976 GB RAM, 8 x 300 GB SSD

Load time: 32m 47s Run Power Throughput Composite 1 592,013.9 754,107.6 668,163.3 2 896,564.1 828,265.4 861,738.4 3 883,736.9 829,609.0 856,245.3

For the larger scale we did 3 sets of power + throughput tests to measure consistency of performance. By the TPC-H rules, the worst (first) score should be reported. Even after bulk load, this is markedly less than the next power score due to working set effects. This is seen to a lesser degree with the first throughput score also.

The numerical quantities summaries are available in a report.zip file, or individually --

report-100-1.txt
report-100-2.txt
report-1000-1.txt
report-1000-2.txt
report-1000-3.txt

Subsequent posts will explain how to deploy Virtuoso Elastic Clusters on AWS.

In Hoc Signo Vinces (TPC-H) Series

In Hoc Signo Vinces (part 1): Virtuoso meets TPC-H
In Hoc Signo Vinces (part 2): TPC-H Schema Choices
In Hoc Signo Vinces (part 3): Benchmark Configuration Settings
In Hoc Signo Vinces (part 4): Bulk Load and Refresh
In Hoc Signo Vinces (part 5): The Return of SQL Federation
In Hoc Signo Vinces (part 6): TPC-H Q1 and Q3: An Introduction to Query Plans
In Hoc Signo Vinces (part 7): TPC-H Q13: The Good and the Bad Plans
In Hoc Signo Vinces (part 8): TPC-H: INs, Expressions, ORs
In Hoc Signo Vinces (part 9): TPC-H Q18, Ordered Aggregation, and Top K
In Hoc Signo Vinces (part 10): TPC-H Q9, Q17, Q20 - Predicate Games
In Hoc Signo Vinces (part 11): TPC-H Q2, Q10 - Late Projection
In Hoc Signo Vinces (part 12): TPC-H: Result Preview
In Hoc Signo Vinces (part 13): Virtuoso TPC-H Kit Now on V7 Fast Track
In Hoc Signo Vinces (part 14): Virtuoso TPC-H Implementation Analysis
In Hoc Signo Vinces (part 15): TPC-H and the Science of Hash
In Hoc Signo Vinces (part 16): Introduction to Scale-Out
In Hoc Signo Vinces (part 17): 100G and 300G Runs on Dual Xeon E5 2650v2
In Hoc Signo Vinces (part 18): Cluster Dynamics
In Hoc Signo Vinces (part 19): Scalability, 1000G, and 3000G
In Hoc Signo Vinces (part 20): 100G and 1000G With Cluster; When is Cluster Worthwhile; Effects of I/O
In Hoc Signo Vinces (part 21): Running TPC-H on Virtuoso Cluster on Amazon EC2 (this post)

#
React

June 09 2015

15:51

Introducing the OpenLink Virtuoso Benchmarks AMI on Amazon EC2

The OpenLink Virtuoso Benchmarks AMI is an Amazon EC2 machine image with the latest Virtuoso open source technology preconfigured to run —

TPC-H , the classic of SQL data warehousing
LDBC SNB, the new Social Network Benchmark from the Linked Data Benchmark Council
LDBC SPB, the RDF/SPARQL Semantic Publishing Benchmark from LDBC

This package is ideal for technology evaluators and developers interested in getting the most performance out of Virtuoso. This is also an all-in-one solution to any questions about reproducing claimed benchmark results. All necessary tools for building and running are included; thus any developer can use this model installation as a starting point. The benchmark drivers are preconfigured with appropriate settings, and benchmark qualification tests can be run with a single command.

The Benchmarks AMI includes a precompiled, preconfigured checkout of the v7fasttrack github repository, checkouts of the github repositories of the benchmarks, and a number of running directories with all configuration files preset and optimized. The image is intended to be instantiated on a R3.8xlarge Amazon instance with 244G RAM, dual Xeon E5-2670 v2, and 600G SSD.

Benchmark datasets and preloaded database files can be downloaded from S3 when large, and generated as needed on the instance when small. As an alternative, the instance is also set up to do all phases of data generation and database bulk load.

The following benchmark setups are included:

TPC-H 100G
TPC-H 300G
LDBC SNB Validation
LDBC SNB Interactive 100G
LDBC SNB Interactive 300G (SF3)
LDBC SPB Validation
LDBC SPB Basic 256 Mtriples (SF5)
LDBC SPB Basic 1 Gtriple

The AMI will be expanded as new benchmarks are introduced, for example, the LDBC Social Network Business Intelligence or Graph Analytics.

To get started:

Instantiate machine image ami-5304ef38 (AMI ID is subject to change; you should be able to find the latest by searching for "OpenLink Virtuoso Benchmarks" in "Community AMIs") with a R3.8xlarge instance.
Connect via ssh.
See the README (also found in the ec2-user's home directory) for full instructions on getting up and running.

#
React

15:24

SNB Interactive, Part 3: Choke Points and Initial Run on Virtuoso

In this post we will look at running the LDBC SNB on Virtuoso.

First, let's recap what the benchmark is about:

fairly frequent short updates, with no update contention worth mentioning
short random lookups
medium complex queries centered around a person's social environment

The updates exist so as to invalidate strategies that rely too heavily on precomputation. The short lookups exist for the sake of realism; after all, an online social application does lookups for the most part. The medium complex queries are to challenge the DBMS.

The DBMS challenges have to do firstly with query optimization, and secondly with execution with a lot of non-local random access patterns. Query optimization is not a requirement, per se, since imperative implementations are allowed, but we will see that these are no more free of the laws of nature than the declarative ones.

The workload is arbitrarily parallel, so intra-query parallelization is not particularly useful, if also not harmful. There are latency constraints on operations which strongly encourage implementations to stay within a predictable time envelope regardless of specific query parameters. The parameters are a combination of person and date range, and sometimes tags or countries. The hardest queries have the potential to access all content created by people within 2 steps of a central person, so possibly thousands of people, times 2000 posts per person, times up to 4 tags per post. We are talking in the millions of key lookups, aiming for sub-second single-threaded execution.

The test system is the same as used in the TPC-H series: dual Xeon E5-2630, 2x6 cores x 2 threads, 2.3GHz, 192 GB RAM. The software is the feature/analytics branch of v7fasttrack, available from www.github.com.

The dataset is the SNB 300G set, with:

1,136,127 persons 125,249,604 knows edges 847,886,644 posts , including replies 1,145,893,841 tags of posts or replies 1,140,226,235 likes of posts or replies

As an initial step, we run the benchmark as fast as it will go. We use 32 threads on the driver side for 24 hardware threads.

Below are the numerical quantities for a 400K operation run after 150K operations worth of warmup.

Duration: 10:41.251 Throughput: 623.71 (op/s)

The statistics that matter are detailed below, with operations ranked in order of descending client-side wait-time. All times are in milliseconds.

% of total total_wait name count mean min max 20 % 4,231,130 LdbcQuery5 656 6,449.89 245 10,311 11 % 2,272,954 LdbcQuery8 18,354 123.84 14 2,240 10 % 2,200,718 LdbcQuery3 388 5,671.95 468 17,368 7.3 % 1,561,382 LdbcQuery14 1,124 1,389.13 4 5,724 6.7 % 1,441,575 LdbcQuery12 1,252 1,151.42 15 3,273 6.5 % 1,396,932 LdbcQuery10 1,252 1,115.76 13 4,743 5 % 1,064,457 LdbcShortQuery3PersonFriends 46,285 22.9979 0 2,287 4.9 % 1,047,536 LdbcShortQuery2PersonPosts 46,285 22.6323 0 2,156 4.1 % 885,102 LdbcQuery6 1,721 514.295 8 5,227 3.3 % 707,901 LdbcQuery1 2,117 334.389 28 3,467 2.4 % 521,738 LdbcQuery4 1,530 341.005 49 2,774 2.1 % 440,197 LdbcShortQuery4MessageContent 46,302 9.50708 0 2,015 1.9 % 407,450 LdbcUpdate5AddForumMembership 14,338 28.4175 0 2,008 1.9 % 405,243 LdbcShortQuery7MessageReplies 46,302 8.75217 0 2,112 1.9 % 404,002 LdbcShortQuery6MessageForum 46,302 8.72537 0 1,968 1.8 % 387,044 LdbcUpdate3AddCommentLike 12,659 30.5746 0 2,060 1.7 % 361,290 LdbcShortQuery1PersonProfile 46,285 7.80577 0 2,015 1.6 % 334,409 LdbcShortQuery5MessageCreator 46,302 7.22234 0 2,055 1 % 220,740 LdbcQuery2 1,488 148.347 2 2,504 0.96 % 205,910 LdbcQuery7 1,721 119.646 11 2,295 0.93 % 198,971 LdbcUpdate2AddPostLike 5,974 33.3062 0 1,987 0.88 % 189,871 LdbcQuery11 2,294 82.7685 4 2,219 0.85 % 182,964 LdbcQuery13 2,898 63.1346 1 2,201 0.74 % 158,188 LdbcQuery9 78 2,028.05 1,108 4,183 0.67 % 143,457 LdbcUpdate7AddComment 3,986 35.9902 1 1,912 0.26 % 54,947 LdbcUpdate8AddFriendship 571 96.2294 1 988 0.2 % 43,451 LdbcUpdate6AddPost 1,386 31.3499 1 2,060 0.0086% 1,848 LdbcUpdate4AddForum 103 17.9417 1 65 0.0002% 44 LdbcUpdate1AddPerson 2 22 10 34

At this point we have in-depth knowledge of the choke points the benchmark stresses, and we can give a first assessment of whether the design meets its objectives for setting an agenda for the coming years of graph database development.

The implementation is well optimized in general but still has maybe 30% room for improvement. We note that this is based on a compressed column store. One could think that alternative data representations, like in-memory graphs of structs and pointers between them, are better for the task. This is not necessarily so; at the least, a compressed column store is much more space efficient. Space efficiency is the root of cost efficiency, since as soon as the working set is not in memory, a random access workload is badly hit.

The set of choke points (technical challenges) actually revealed by the benchmark is so far as follows:

Cardinality estimation under heavy data skew — Many queries take a tag or a country as a parameter. The cardinalities associated with tags vary from 29M posts for the most common to 1 for the least common. Q6 has a common tag (in top few hundred) half the time and a ran