Loading Events

« All Events

  • This event has passed.

Building Data Workflows and Stateful Pipelines (with Airflow and Alooma)

February 23 @ 6:00 pm - 8:30 pm


Learn from Data Engineers at Alooma and Airbnb!

Talk 1: Unlock the Power of Data with Stateful Processing

Achieving stateful processing of events remains a challenge for many data teams. Scalability, exactly-once processing with state, on-stream joins from different sources, and building sessions are just few of the hurdles to building a stateful data application. In this talk we will discuss considerations associated with building stateful data applications and how to convert existing batch-oriented processes into the realm of real-time event streams.

About the Speaker:

Yair Weinberger is a veteran developer and researcher and Co-Founder & CTO at Alooma. His interest in disrupting the data layer is his day job. And in his spare time he enjoys playing chess with a knight and pawns against his 3 year old. He can be found on Quora or Twitter at @yairwein.

Talk 2: Building Data Workflows with Airflow

Moving and transforming data from one system to the next is still a large part of the job of a data scientist or engineer. Do you have batch data processing tasks that need scheduling? Then Apache Airflow might be the solution for you. Airbnb has been using Apache Airflow in production for over two years to schedule and orchestrate data pipelines. We will present the basics of Airflow, what it can do for you and what you need to know to get started.

Apache Airflow is a popular open source platform to author, orchestrate and monitor pipelines. Airflow can accomodate complex dependencies and run tasks on a variety of systems and cloud platforms. Apache Airflow’s DSL based on Python makes Airflow easily extensible and dynamic. The Airflow UI allows you to track the progress of your ETL processes as well as collect statistics about your jobs. You can get started with Airflow on a single node, but Airflow can handle multi-worker operations with thousands of workflows, and tens of thousands of tasks per day. We will share some of our learnings over two years of Airflow operations, where Airflow works well and where it can still improve. Finally, Airflow is open source, incubating under the Apache Foundation and has an active community welcoming contributions.

About the Speaker:

Arthur Wiedmer joined Airbnb as a Data Engineer in 2014. He builds frameworks and pipelines on top of Airflow and focuses on disseminating best practices for pipelines at Airbnb as part of the Data Platform Team. He is a committer on Apache Airflow (incubating).


6:00pm – Doors open, networking, snacking

6:20pm – Talk Kick-Off

7:45pm – Final notes, Open Q&A

8:00pm – Conclusion

About the Galvanize Data Science Immersive:

This program is for people who are serious about becoming data scientists. Over the course of 12 weeks you’ll learn the tools, techniques, and fundamental concepts you need to know to make an impact as a data scientist. You’ll work through messy, real-world data sets to gain experience across the data science stack: data munging, exploration, modeling, validation, visualization, and communication. Our unique setting in the Galvanize community of startups and tech companies is the perfect place to learn and expand your network!

If you have any questions about Galvanize’s Data Science Immersive program before this event, please feel free to reach out to our Admissions Advisor, Victor, at victor.valdez@galvanize.com.


February 23
6:00 pm - 8:30 pm


44 Tehama Street
San Francisco, CA 94105 US