As data scientists, we find ourselves working with increasingly large and complex data in our day to day work. The standard toolset of a data scientist using R or Python on a single workstation, using dataframes stored locally, has not evolved to meet this need. Apache Spark represents the evolution of the standard data science toolset to accommodate datasets of almost any size.
Spark enables the use of dataframes and SQL queries (as well as the map/reduce paradigm) on data sets of almost any size. When deployed in the cloud using Amazon Web Services, Spark has been proven to work effectively with petabytes (thousands of terabytes) of data. For example, within the past two years, Spark clusters deployed on AWS have set world records for sorting large datasets, sorting 100 terabytes of data in 24 minutes, and 1 petabyte of data in less than 4 hours. Large companies use Spark to process data at scale in near-real-time.
Installing Spark locally on your personal computer will empower you to gain experience with this breakthrough technology.
This workshop is for anyone with a strong personal or professional interest in data science. This is an introductory workshop, so we don’t expect you to know anything in particular about Spark. All you need to come to our workshop is a working knowledge of Python programming, a laptop running Mac OS X or Linux, and a readiness to learn.
This beginner workshop is designed to help you install Apache Spark for the first time on a Macintosh or Linux computer, or to upgrade to Spark 2.0.1 from an older version of Spark.
Spark is a powerful, open source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms up to 100x faster than MapReduce. Spark can be used for batch processing and for processing data in near real-time.
Jean-François (Jeff) Omhover, Galvanize Data Science Instructor
Jeff is a Senior Data Scientist and Instructor in the Galvanize Data Science Immersive program. Prior to joining Galvanize, Jeff was an Assistant Professor at one of the leading engineering schools in France. He managed large scale multidisciplinary research projects in partnership between industry and academia. He has used Spark and Natural Language Processing for mining consumer sentiment and brand perception from user comments, and for mining concepts from scientific papers.
Miles is a Data Scientist and Associate Instructor in the Galvanize Data Science Immersive program. Before joining Galvanize, Miles worked as a systems/network engineering consultant and taught college-level classes in IT infrastructure and security. Miles has contributed to the development of widely recognized certification exams for server engineers. Miles is a graduate of the University of Washington and is a co-organizer of the local Python community in Seattle.
This workshop is limited to 35 people. Please sign up via Eventbrite.
About our Sponsor
Galvanize is the premiere dynamic learning community for technology. With campuses located in booming technology sectors throughout the country, Galvanize provides a community for each the following:
To learn more about Galvanize, visit galvanize.com.