Pandas

RDDs are the new bytecode of Apache Spark

With the Apache Spark 1.3 release the Dataframe API for Spark SQL got introduced, for those of you who missed the big announcements, I'd recommend to read the article : Introducing Dataframes in Spark for Large Scale Data Science from the Databricks blog. Dataframes are very popular among data scientists, personally I've mainly been using them with the great Python library Pandas but there are many examples in R (originally) and Julia.