Everything You Need to Know about Data Science Conferences

data science conferences

Hello data scientists! In this post, I’m going cover the top data science conferences in the United States. If you want to attend one of these conferences, I’ve included a section with discounts as a bonus 🙂

Strata + Hadoop World

Presented by O’Reilly and Cloudera, Strata + Hadoop World is where big data, cutting-edge data science, and new business fundamentals intersect and merge.

Who Should Attend:

Strata + Hadoop World is where big data’s most influential business decision makers, strategists, architects, developers, and analysts gather to shape the future of their businesses and technologies.

At Strata + Hadoop you will:

  • Be among the first to understand how you can leverage the promise of this huge change, and survive the resulting disruption
  • Find new ways to leverage your data assets across industries and disciplines
  • Learn how to take big data from science project to real business application
  • Discover training, hiring, and career opportunities for data professionals
  • Meet face-to face with other innovators and thought leaders


Edd Dumbill announced the launch of the Strata in September 2010. The first Strata Conference was held in Santa Clara in 2011. When talking about the vision for Strata, Dumbill states:

“We believe that the future belongs to those who understand how to collect and use their data successfully. There’s a change in both the skills of data analysts and the technology they use that’s sweeping through industry and science. Our aim with Strata is to be the defining event for that change: for practitioners, businesses and data vendors.”


PyData brings together users and developers of data analysis tools to share ideas and learn from each other. The conference addresses evolving challenges in data management, processing, analytics, and visualization.

Who should attend?

Developers and users of data analysis tools including Python, R, and Julia.


PyData was initially founded to provide Python data enthusiasts a place to share ideas and learn from each other.  A major goal of the first conference was to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

The first PyData Workshop was held in March of 2012, at the Googleplex in Mountain View, CA. Many prominent individuals in the Python data & scientific computing community were on-hand to deliver tutorials and how-to presentations. The workshops (and a Friday night hackathon) were a great success.

PyData evolved to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

PyData has since grown into an international conference series drawing thousands of participants each year.

Data Science Summit presented by Dato

The Data Science Summit is where you can learn how to create the next generation of intelligent applications. Visionaries share their new, surprising and inspiring ideas, academics teach the frontiers of machine learning, and industry leaders share practical case studies on how their data science and development teams are building new products and services with machine learning.

Who Should Attend:

The Data Science Summit is for anyone who wants to learn how to build the next generation of intelligent applications.

Developers and data scientists come to learn the latest technical innovations and to be inspired by the ways members of the community are applying them. Business leaders come to see what is possible for their teams and how they can reinvent their companies by leveraging data science, machine learning and intelligent applications


Dato held their first annual conference in 2011. Back then, they were an academic project called GraphLab based out of Carnegie Mellon University. The graph analytics software was open source for a couple of years, and companies frequently invited the Dato team to give talks about it.

Since Dato didn’t have enough people to go visit all those companies, they decided to hold a workshop. Dato planned to invite a dozen companies, around 30 people in total. More than 300 people registered! They switched the workshop location to much a bigger room.

The first workshop convinced Dato that there was a large demand for an applied machine learning solution. The next year they held the workshop, 500 data scientists and researchers showed up. The third year, they had 700 registrations. And so the Dato team decided it was time to change our event name from a workshop to a conference since it had become one of the largest machine learning events in the world.

In 2015, Dato renamed the conference to be “the Data Science Summit” and expanded the scope of our event to include all data science aspects and renamed it the Data Science Summit. They had over 1100 participants with world leaders in data science speaking, including names like Prof. Rob Tibshirani (Stanford), Prof. Alex Smola (CMU), Prof. Chris Re (Stanford), Prof. Carlos Guestrin (UW), Prof. Jeef Heer (UW), Prof. Mike Jordan (Berekely) as well as many others.


Datapalooza is an immersive experience for the data science community where attendees will learn how to craft a data product in 3 days.

Who should attend?

Data science professionals including: data scientists, data engineers, app developers.


Datapalooza was founded on the idea of community. Like Lollapalooza, the organizers want deliver a amazing shared experience or data science professionals of learning, networking, shared success and fun where every participant will build a data product in just three days.

Rich Data Summit

The Rich Data Summit is thrown by data scientists, for data scientists. The organizers focus not only on where they see data science going in the future, but the challenges data scientists have now, primarily that they spend up to 80% of their time cleaning and labeling data instead of doing something that’s, well, a bit more enjoyable.

Who should attend?

Data scientists, first and foremost. However, anyone who’s interested in how data affects business, government, sports, etc. will get value from attending.


Crowd Flower is a company founded by data scientists; they strive to make sure that their platform gives data scientists time to do the work they want to do by saving them time cleaning and labeling their data.

In the words of the Crowd Flower Team:  

“Everyone talks about their algorithms and the importance of “”big data”” but we don’t spend enough time working with what we have and working to make big data into something more useful—like rich data. Also, we like parties.”

H2o World

H2O World is a leading machine learning conference in Silicon Valley. Attendees will get to hear from rockstar data scientists like Hilary Mason and Monica Rogati and large organizations like Quora and Macy’s.

Who Should Attend:

This conference is aimed at developers and data scientists who are looking to leverage the power of machine learning to build smarter applications.  


H2O World was created in order to provide a place where data scientists and developers could come together to discuss the latest trends and use cases for machine learning technology. This is the second year of the conference and the organizers expect 700+ attendees over three days and 70+ talks and sessions.

ML Conf

MLconf is a single day, single track event, devoted to the Machine Learning and Data Science community in major cities, agnostic of any tool, platform or company. MLconf events host speakers from various industries, research and universities to discuss recent research and application of Machine Learning methodologies and practices.

MLconf has a “no sales pitch” motto; the organizers carefully curate content to help members of the community share what’s being used now.

Who Should Attend:

I could list a bunch of titles (i.e. Data Scientists, Researchers, Software Engineers, etc.)… really, if you’re interested in the recent research and application of Machine Learning methodologies and practices, you should attend.


In 2013, MLconf became a separate event, devoted to the Machine Learning and Data Science community in San Francisco, agnostic of any tool, platform or company.

The goal of MLconf is to host speakers from various industries, research and universities to discuss recent research and application of Machine Learning methodologies and practices.

In 2014, MLconf entered NYC and Atlanta, as well as San Francisco. In 2015, MLconf has hosted conferences in NYC, Atlanta, Seattle and San Francisco, with plans to enter additional US cities in 2016, and the UK.

Open Data Science Summit

The Open Data Science Conference (ODSC) brings together the data science community to help foster the exchange of innovative ideas and encourage the growth of open source software.  

In addition to global conferences, the ODSC team also runs meetups, workshops, code sprints and hackathons to help current and future data scientists learn, connect and collaborate.

Who Should Attend:

Decision makers (CTOs and lead data scientists) and the decision influencers (the people who actually use and build analytic tools).


ODSC began as a successful Meetup in Boston to help practicing data scientists network and exchange ideas with an emphasis on open source projects. The Meetup grew into an annual event, Boston Data Fest. Starting in 2015, ODSC is also a host of ODSC Boston.

Lucene Revolution

Bringing together the Apache Lucene/Solr community from around the world to hear about the latest trends and use cases for the most widely deployed search platform on the planet. Attendees will learn from those who built and work on the platform every day.

Who Should Attend:

Developers, Engineers, and Engineering Managers– Apache open source experts/enthusiasts along with developers and managers whose jobs are to build search and discovery applications within their organizations


Lucene/Solr Revolution originated in 2010 as a small developer conference, hosted by Lucidworks, to bring the distributed (worldwide) Apache Solr community together for face-to-face collaboration.

Lucene Revolution has doubled in size since its origin and has attracted the attention and support of large enterprises like Salesforce & Bloomberg who have participated as conference sponsors to showcase their use of the Solr platform as well as to hire top Solr talent.  With such a distributed developer and committer community, Lucene/Solr Revolution is the place to learn, collaborate, and meet the best and brightest minds in Solr, and is the largest face-to-face gathering of Solr committers.

I hope this ode to the top data science conferences in the U.S. aids you in your quest for data science mastery and adventure. Without further ado, here are the conference discounts…

