[Author: Geoff Chappell Last Update: 2004-11-04]
Performance Statistics for RDF Gateway

This page details some performance measurements of RDF Gateway. All test were run on a Dell 4600 runing Windows 2000 Server with two 3 GHz Xeon processors, 4 GBytes of RAM, and 300 GBytes of hard-drive (RAID 5). RDF Gateway version 2.2 (pre-release) was used for the tests.

The following table details limitations in the tested version of RDF Gateway.

Maximum disk table size2 Terabytes
Maximum number of individual URIs or Literal per disk table2 billion

Test 1: Bulk Load of Triples into Disk Table

This test measured the time it takes to bulk load a quantity of triples into a disk-based RDF Gateway table. The table was created without context (meaning it could hold triples, not quads), with full-text indexing enabled, and with transaction logging disabled. The source for the load was a 255 MByte Ntriples file containing 2,069,663 triples.

The following script was used for the test:

create table testload context non (NOTRANS) var ds = new datasource("inet?parsetype=ntriples&convert_types=yes&url=e:/tempdata/geo.rdf"); load {?p ?s ?o} into testload using #ds;
Results

Notes on test: data was datatype-intensive which slows import somewhat; use of quad table would decrease performance by ~50% and increase size approximately the same amount.

Test 2: Select the properties of a single resource

This test measured the time it takes to repeatedly query the properties of a single resource. It is intended to measure the overhead of a simple query. The query was run 10,000 times (sequentially) in order to find an average query duration. The table loaded in the previous test is used for the query (~2M triples).

The following script was used for the test:

for (var i=0; i<10000; i++) (select ?p ?o using testload where {?p [http://www.census.gov/tiger/2002/CFCC/B12] ?o});
Results

Test 3: Select the properties of a single resource with inference

This test measured the time it takes to repeatedly query for the subclasses of a particular class. A rulebase is included in the query that expresses the transitivity of the subclass relationship. The query was run 10,000 times (sequentially) in order to find an average query duration. The table loaded in the previous test is used for the query (~2M triples).

The following script was used for the test:

import "/std/ns.rql"; rulebase test { infer {[rdfs:subClassOf] ?x ?z} from {[rdfs:subClassOf] ?x ?y} and {[rdfs:subClassOf] ?y ?z}; }; for (var i=0; i<10000; i++) (select ?o using testload rulebase test where {[rdfs:subClassOf] [http://www.census.gov/tiger/2002/CFCC/B12] ?o});
Results

Test 4: Perform a SELECT with many conditions with varying result limits

This test performs a query with a number of conditions that returns a large quantity of results. The results are limited to several different counts. The query was run 1000 times (sequentially) in order to find an average query duration. The table loaded in the previous test is used for the query (~2M triples).

The following script was used for the test:

import "/std/ns.rql"; for (var i=0; i<1000; i++) { (select top 100 ?feature_name ?feature_type_label using geo where {[rdf:type] ?county [http://www.census.gov/tiger/2002/fips/area_description/06]} and {[tiger:name] ?county 'Addison'} and {[tiger:area] ?county ?area} and ({[tiger:atLeft] ?line ?area} or {[tiger:atRight] ?line ?area}) and {[tiger:path] ?feature ?line} and {[tiger:name] ?feature ?feature_name} and {[rdf:type] ?feature ?feature_type} and {[rdfs:label] ?feature_type ?feature_type_label} order by ?feature_name); }
Results for top 100
Results for top 10

Notes about test: there is 1 county matching first two conditions; there are 2097 areas matching first three conditions; there are 9536 lines matching first 4 conditions; there are 9671 features matching all conditions.

Test 5: Simultaneous querying by a number of users

This test uses the Microsoft Web Application Stress Tool to access a page served by RDF Gateway via HTTP with a variable number of users. The page performs a single query shown below and returns the results as RDF/XML (simulating a simple web service). This is a test of both the HTTP interface and simultaneous access to the triplestore. The table loaded in the previous test is used for the query (~2M triples). The stress tool was run on a separate machine and connected to the test server via a 100Mbit ethernet network. Each test run was 60 seconds with a varied number of threads (simulating users).

The following script was used for the test:

var rs = (select ?p ?s ?o using testload where ?s=[http://www.census.gov/tiger/2002/CFCC/B12] and {?p ?s ?o}); response.write(datasource(rs).format()); Each page request returns the following result: <?xml version="1.0" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.w3.org/2000/01/rdf-schema#"> <ns1:Class rdf:about="http://www.census.gov/tiger/2002/CFCC/B12"> <ns1:label>Railroad main track, in tunnel</ns1:label> <ns1:subClassOf rdf:resource="http://www.census.gov/tiger/2002/CFCC/B1" /> </ns1:Class> </rdf:RDF>
Results for 1 thread
Results for 2 threads
Results for 5 threads
Results for 25 threads

Notes on test: There is an expected increase in requests per second when going from 1 to 2 thread (since the tested machine has 2 processors). Thereafter, the requests will be competing from processor time so there is some drop-off as the number of threads increases. Since the query is only acquiring a read lock, there is no database contention (either expected or evident).