[Author: Geoff Chappell Last Update: 2004-10-31]
Experiments with TIGER/line census data and RDF

Overview

TIGER/line census data is published by the US Census Bureau and is available for download here. The data describe legal and statistical entities such as towns, counties, and states; landmarks such as churches, hospitals, and cemetaries; and geographical features such as roads, rivers, and railroads. The area, path, or location of each is described in terms of polygons, lines, or points respectively (all in terms of longitude and latitude.)

The primary goal of this experiment is to generate a large body of geo-spatial data by converting the TIGER/line data to RDF and to demonstrate the capabilities of RDF Gateway in transforming, storing, and querying such data. A secondary goal is to make this particular data available to the semantic web community (others have mentioned the desire to see this data available as RDF). For now the conversion scripts are available (plus examples of running queries), but in the future the converted data or a query service (SPARQL?) against the data may be available.

The Conversion Process

The census data is published for each county. Each county distribution zip file contains a number of fixed-length record text files. The TIGER/line documentation details the layouts of the different files.

The data is relatively static, so I chose to create conversion scripts (using RDFQL, RDF Gateway's dynamic language which is a mix of javascript and a deductive database command/query language) to transform the fixed-length record files into RDF. If the data were more dynamic, I might instead have created a data service (a custom software component to interface a data source with RDF Gateway) to connect to it as a live source.

The conversion process roughly followed these steps (with a few refinement iterations):

Running the completed scripts on a state such a vermont yields nearly 2 million triples - sufficient data for developing queries. Note that I skipped converting some data - shape nodes for lines, address and zip ranges for roads, alternative names for features, etc. I'll likely add some or all of those at some point.

Source Files

Queries

Below are some sample queries I developed against the TIGER/line data. All of the queries assume the data has been imported into a table named geo.

Example 1 - Find the polygons that describe the area of Addison county [Try It]

select ?poly using geo where {[rdf:type] ?county [http://www.census.gov/tiger/2002/fips/area_description/06]} and {[tiger:name] ?county 'Addison'} and {[tiger:area] ?county ?poly}

Example 2 - Find up to 100 linear features in Addison County [Try It]

select top 100 ?feature_name ?feature_type_label using geo where {[rdf:type] ?county [http://www.census.gov/tiger/2002/fips/area_description/06]} and {[tiger:name] ?county 'Addison'} and {[tiger:area] ?county ?area} and ({[tiger:atLeft] ?line ?area} or {[tiger:atRight] ?line ?area}) and {[tiger:path] ?feature ?line} and {[tiger:name] ?feature ?feature_name} and {[rdf:type] ?feature ?feature_type} and {[rdfs:label] ?feature_type ?feature_type_label} order by ?feature_name

Example 3 - Find all the towns and their respective counties in Vermont [Try It]

// there isn't a single class for town but all // town equivalents share a common superclass rulebase x{ infer {[rdfs:subClassOf] ?x ?z} from {[rdfs:subClassOf] ?x ?y} and {[rdfs:subClassOf] ?y ?z}; infer {[rdf:type] ?a ?y} from {[rdfs:subClassOf] ?x ?y} and {[rdf:type] ?a ?x}; }; select ?state_name ?county_name ?town_name ?town_type_label using geo rulebase x where {[tiger:name] ?state 'Vermont'} and {[rdf:type] ?state [http://www.census.gov/tiger/2002/fips/area_description/01]} and {[tiger:intersectsArea] ?state ?town} and {[rdf:type] ?town ?town_type} and {[rdfs:subClassOf] ?town_type [http://www.census.gov/tiger/2002/fips/area_description/MCD]} and not {[rdfs:subClassOf] ?sup ?town_type} and {[tiger:name] ?state ?state_name} and {[tiger:name] ?town ?town_name} and {[rdfs:label] ?town_type ?town_type_label} and {[tiger:intersectsArea] ?town ?county} and {[rdf:type] ?county [http://www.census.gov/tiger/2002/fips/area_description/06]} and {[tiger:name] ?county ?county_name} order by ?county_name, ?town_name

Example 4 - Find all of the landmarks [Try It]

select ?landmark_name ?landmark_type_label using geo where {[rdf:type] ?landmark [tiger:Landmark]} and {[tiger:name] ?landmark ?landmark_name} and {[rdf:type] ?landmark ?landmark_type} and ?landmark_type <> [tiger:Landmark] and {[rdfs:label] ?landmark_type ?landmark_type_label} order by ?landmark_type_label, ?landmark_name

Example 5 - Find the smallest rectangle that will hold the town of Addison [Try It]

select ?minlat ?maxlat ?minlong ?maxlong using geo where {[rdf:type] ?town [http://www.census.gov/tiger/2002/fips/area_description/43]} and {[tiger:name] ?town 'Addison'} and {[tiger:area] ?town ?area} and ({[tiger:atLeft] ?line ?area} or {[tiger:atRight] ?line ?area}) and {[tiger:start] ?line ?start} and {[tiger:end] ?line ?end} and ({[tiger:lat] ?start ?lat} or {[tiger:lat] ?end ?lat}) and ({[tiger:long] ?start ?long} or {[tiger:long] ?end ?long}) and ?minlat=min(?lat) and ?maxlat=max(?lat) and ?minlong=min(?long) and ?maxlong=max(?long)

Example 6 - Find all features within 1 mile of Middlebury College [Try It]

// this could be combined into a single query but this demonstrates // the integration between the deductive database language and javascript var rs = (select ?lat ?long using geo where {[tiger:name] ?landmark 'Middlebury College'} and {[tiger:location] ?landmark ?location} and {[tiger:lat] ?location ?lat} and {[tiger:long] ?location ?long}); getNear(rs["long"], rs["lat"]); function getNear(lon, lat) { var mile = 1000000/72; var dist = integer(mile * 1); rulebase geo { //assume the world is flat :-) infer getdistance (?dist, ?lat1, ?lon1, ?lat2, ?lon2) from ?dist=pow( add(pow(sub(?lat1, ?lat2), 2),pow(sub(?lon1, ?lon2), 2)), .5) } //the query approximates first with a square (to take advantage of indices) //then winnows results according to max radial distance select ?name ?fl using geo rulebase geo where {[tiger:long] ?s ?lon} and between(?lon, #(lon - dist), #(lon + dist)) and {[tiger:lat] ?s ?lat} and between(?lat, #(lat - dist), #(lat + dist)) and getdistance(?d, ?lat, ?lon, #lat, #lon) and ?d < #dist and ({[tiger:start] ?x ?s} or {[tiger:end] ?x ?s}) and {[tiger:path] ?f ?x} and {[rdf:type] ?f ?ft} and {[rdfs:label] ?ft ?fl} and switch (?f)( case {[tiger:name] ?f ?n}: ?name=?n default: ?name='' ) order by ?name; }

Example 7 - Find the outline of Addison County and display as SVG [Try It]

// this example also demonstrates the templating capabilities of RDFQL
// finding the boundary lines (of all the lines associated with all of the polygons)
// can be pretty expensive - might be better to add a [tiger:boundaryLine] property

var rs = (select ?start_lat ?start_long ?end_lat ?end_long ?min_lat ?max_lat ?min_long ?max_long using geo where 
	{[rdf:type] ?county [http://www.census.gov/tiger/2002/fips/area_description/06]} 
	and {[tiger:name] ?county 'Addison'} 
	and {[tiger:area] ?county ?poly}
	and (
		{[tiger:atLeft] ?line ?poly} and not ({[tiger:atRight] ?line ?poly2} and {[tiger:area] ?county ?poly2})
			or
		{[tiger:atRight] ?line ?poly} and not ({[tiger:atLeft] ?line ?poly2} and {[tiger:area] ?county ?poly2})
	    )
	and {[tiger:start] ?line ?start}
	and {[tiger:lat] ?start ?start_lat}
	and {[tiger:long] ?start ?start_long}
	and {[tiger:end] ?line ?end}
	and {[tiger:lat] ?end ?end_lat}
	and {[tiger:long] ?end ?end_long}
	and (?lat=?start_lat or ?lat=?end_lat)
	and (?long=?start_long or ?long=?end_long)
	and ?max_lat=max(?lat)
	and ?max_long=max(?long)
	and ?min_lat=min(?lat)
	and ?min_long=min(?long)
	);

var minx = rs["min_long"]/1000;
var miny = rs["min_lat"]/1000;
var maxx = rs["max_long"]/1000;
var maxy = rs["max_lat"]/1000;
var width=maxx-minx;
var height=maxy-miny;

response.contentType="image/svg+xml";

%><svg xmlns='http://www.w3.org/2000/svg' width="300" height="300" viewBox="0 0 <%=width%> <%=height%>">
<g style="stroke-width:1">
<%
for (rs.moveFirst();!rs.EOF;rs.MoveNext())
{
	%><line x1="<%=rs[1]/1000 - minx%>" y1="<%=height-(rs[0]/1000 - miny)%>" x2="<%=rs[3]/1000 - minx%>" y2="<%=height-(rs[2]/1000 - miny)%>" style="stroke:black;stroke-linecap:round"/>
<%	
}
%>
</g>
</svg>