This event will feature a series of brief, engaging lightning talks with data scientists discussing Apache Spark.
Speakers keep their presentations under 20 minutes, and allow the audience to ask questions about their presentation for 15 minutes.
These lightning talks are for anyone with a strong personal or professional interest in data science, data engineering, and/or Apache Spark. Beginners are welcome!
Apache Spark is a powerful, open source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms up to 100x faster than MapReduce. Spark can be used for batch processing and for processing data in near real-time.
David Valpey, Data Scientist
Our legal system depends on knowledge of what came before. Anyone working within our legal system must navigate a large amount of data. David describes his use of Python, Apache Spark, Numpy, NLTK, and BeautifulSoup to extract key features of Washington State Supreme Court opinions and navigate them by similarity.
David is a data scientist with a background in computer science in linguistics.
Sal Khan, Data Scientist
Pandora and other popular music services recommend individual songs based on their musical characteristics. Because he prefers listening to entire albums, Sal built an album recommendation system using Spark ML. He also deployed it as a live web application using Flask and UWSGI.
Sal is a data scientist with a background in consulting and business analytics.
Rob Dalton, Data Scientist
Amazon and other online shopping sites provide information on product quality in the form of customer reviews. Rob Dalton has built a PySpark application that mines Amazon product reviews, extracting the most criticized features and the most praised features of each product. This tool can help customers save time spent reading full reviews, and it can help companies identify potential product defects.
Rob is a data scientist with a background in management consulting and web development.