
上QQ阅读APP看书,第一时间看更新
.join(...) transformation
The join(RDD') transformation returns an RDD of (key, (val_left, val_right)) when calling RDD (key, val_left) and RDD (key, val_right). Outer joins are supported through left outer join, right outer join, and full outer join.
Look at the following code snippet:
# Flights data
# e.g. (u'JFK', u'01010900')
flt = flights.map(lambda c: (c[3], c[0]))
# Airports data
# e.g. (u'JFK', u'NY')
air = airports.map(lambda c: (c[3], c[1]))
# Execute inner join between RDDs
flt.join(air).take(5)
This will give you the following result:
# Output
[(u'JFK', (u'01010900', u'NY')),
(u'JFK', (u'01011200', u'NY')),
(u'JFK', (u'01011900', u'NY')),
(u'JFK', (u'01011700', u'NY')),
(u'JFK', (u'01010800', u'NY'))]