Quick Tip – caching In spark

Caching in Spark SQL works a bit differently. Since we know the types of each column, Spark is able to more efficiently store the data. To make sure that we cache using the memory efficient representation, rather than the full objects, we should use the special hiveCtx.cacheTable("tableName") method. When caching a table Spark SQL represents the data in an in-memory columnar format. This cached table will remain in memory only for the life of our driver program, so if it exits we will need to recache our data. As with RDDs, we cache tables when we expect to run multiple tasks or queries against the same data.

Recommended :  Introduction to Spark Core & Spark SQL

You can also cache tables using HiveQL/SQL statements. To cache or uncache a table simply run CACHE TABLE tableName or UNCACHE TABLE tableName. This is most commonly used with command-line clients to the JDBC server.

About the author

Deven Rathore

I'm Deven Rathore, a multidisciplinary & self-taught designer with 3 years of experience. I'm passionate about technology, music, coffee, traveling and everything visually stimulating. Constantly learning and experiencing new things.

Pin It on Pinterest

Shares