Loading Events

« All Events

  • This event has passed.

Intro to Spark for Data Science _Two Day Workshop

December 4 @ 9:00 am - 5:00 pm

|Recurring Event (See all)
Daily until December 4, 2016
29131088871-25622980

Apache Spark simplifies working with data at scale, making it faster to do machine learning on large data sets. Used by data professionals at Amazon, eBay, NASA, and 200+ other organizations, Spark’s community is one of the fastest growing in the world.

In this workshop, take your data skills to the next level by using Spark to build data pipelines. Workshop instructors will be on-hand all weekend to teach, live code, and help debug as we work through the course materials.

After completing this weekend workshop, you’ll be better prepared to use Spark for real projects and problems on your own. We’ll use Spark to power product recommendations and natural language processing tasks. It’s a great way to quickly ramp up your data skills in just 2 days.

Who is This Course For?

This is an introductory course, so we don’t assume you know anything in particular about Spark. All you need to come to our workshop is a working knowledge of programming to get through the course materials, a laptop, and a readiness to learn. The more you know before the course, the more you’ll get out of it, so we do recommend the pre-course materials below:

  • We recommend installing Anaconda (https://www.continuum.io/downloads/), the distribution of Python that is most useful for data science and data engineering.
  • Learn Python The Hard Way (https://learnpythonthehardway.org/book/) is a free, long-form Python tutorial that will help to reinforce your basic Python skills, especially if you are relatively new to Python coding.
  • After you register for the course, your Galvanize instructors will provide detailed instructions for installing and configuring Spark and Hadoop on your computer and help you along the way if you need it.

Course Outline

December 3rd & 4th 2016, 9:00 AM – 5:00 PM (Lunch provided)

In this two-day, in-person, hands-on Spark course, we will:

  • Import, clean, and query data using Spark RDDs and Spark SQL.
  • Build a product recommendation system using Spark.
  • Perform Natural Language Processing (NLP) using Spark.
  • Use Amazon Web Services (AWS) to deploy a Spark cluster.
  • Use a Spark cluster to process large datasets that cannot fit on your personal computer.

Meet Your Instructors

Jean-François (Jeff) Omhover, Galvanize Data Science Instructor

Jeff is a Senior Data Scientist and Instructor in the Galvanize Data Science Immersive program. Prior to joining Galvanize, Jeff was an Assistant Professor at one of the leading engineering schools in France. He managed large-scale multidisciplinary research projects in partnership between industry and academia. He has used Spark and Natural Language Processing for mining consumer sentiment and brand perception from user comments, and for mining concepts from scientific papers.

Miles Erickson, Galvanize Data Science Associate Instructor

Miles is a Data Scientist and Associate Instructor in the Galvanize Data Science Immersive program. Before joining Galvanize, Miles worked as a systems/network engineering consultant and taught college-level classes in IT infrastructure and security. Miles has contributed to the development of widely recognized certification exams for server engineers. Miles is a graduate of the University of Washington and is a co-organizer of the local Python community in Seattle.

Why Spark?

Spark is a powerful, open-source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms 100x faster than MapReduce. Spark can be used for batch processing and for processing data in near real-time.

About Galvanize

Galvanize is the premiere dynamic learning community for technology. With campuses located in booming technology sectors throughout the country, Galvanize provides a community for each the following:

Education – part-time and full-time training in web development, data science, and data engineering

Workspace – whether you’re a freelancer, startup, or established business, we provide beautiful spaces with a community dedicated to supporting your company’s growth

Networking – events in the tech industry happen constantly in our campuses, ranging from popular Meetups to multi-day international conferences

To learn more about Galvanize, visit galvanize.com.

To learn more about our data science initiatives, please visit this link: http://www.galvanize.com/data-science/

Software & Prerequisite Requirements

This workshop series is for anyone who wants to use Spark to analyze data at scale.

Programming Experience:

Course examples and exercises will use Python and PySpark. Basic working knowledge of the Python programming language (i.e. the ability to write scripts and functions in Python) is required.

Command Line & Version Control:

Basic knowledge of Unix commands (i.e. command line) is required.
We will use GitHub for sharing and maintaining code. If you do not already have a GitHub account, you will need to create one before class begins.

Technology Requirements:

Students are expected to bring a personal computer running the Mac OS X or Linux operating system, with at least 4GB of RAM and at least 6GB of free disk space.

 

Refund policy

  • Cancel more than 48 hours before the first class: Full Refund (minus processing fee).
  • Cancel after the first day: Can apply the cost to a different program, no refund.

*Students are responsible for monitoring and controlling expenses incurred on their own Amazon Web Services accounts. A typical student is expected to spend less than $50 on AWS usage during this course. Amazon bills by the hour for usage of cloud resources and students are advised to shut down their Spark clusters at the end of the workshop. Accidentally leaving a Spark cluster running when it is not in use could result in significant overages.

Please contact amara.wilson@galvanize.com for more details

Details

Date:
December 4
Time:
9:00 am - 5:00 pm
Event Category:
Website:
https://www.eventbrite.com/e/intro-to-spark-for-data-science-seattle-123-tickets-29131088871

Venue

Galvanize Seattle
111 South Jackson Street
Seattle, WA 98104 United States
+ Google Map

Organizer

Galvanize Seattle