Ray Summit 2020 has ended
View More Details for Ray Summit & Registration Information.
Please note: All Sessions are in Pacific Daylight Time (PDT), UTC-7

Thursday, October 1 • 10:10am - 10:40am
Dask-on-Ray: Using Dask and Ray to Analyze Petabytes of Remote Sensing Data - Clark Zinzow, Descartes Labs

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Dask is a parallel computing library for Python, providing parallel ndarray and dataframe abstractions that mimic the interfaces of NumPy and Pandas. Beneath the interfaces of these abstractions, Dask automatically partitions data and composes a task graph representing NumPy and Pandas operations over these data chunks. This task graph is then executed by a Dask scheduler, parallelizing the execution of tasks across cores and/or machines, yielding both task and data parallelism.

This talk is about a Dask-on-Ray scheduler that was recently added to Ray. By implementing a Dask scheduler that farms Dask tasks out to a Ray cluster, we can run the entirety of the Dask ecosystem on top of Ray. By leveraging Ray’s innovations around decentralized peer-to-peer resource-aware locality-aware local-first scheduling, scheduling decision caching for fast worker-to-worker RPCs, zero-copy intra-node data sharing, task, worker, and GCS fault-tolerance, and scheduling throughput being unaffected by cluster state inspection, we obtain a fast, scalable, fault-tolerant Dask backend with great ops tooling at no performance cost, capable of executing any Dask graph.

Finally, we’ll talk about how Descartes Labs is leveraging Ray and this Dask-on-Ray integration to provide a geospatial data analysis product, Workflows. Using a highly scalable Ray-based backend, Workflows provides both low-latency interactive data analysis and massively scalable batch computation, distributing work across tens of thousands of cores and allowing users to process petabytes of remotely sensed data with ease

avatar for Clark Zinzow

Clark Zinzow

Software Engineer, Descartes Labs
Clark Zinzow is a software engineer at Descartes Labs, where he's building a platform for geospatial data analysis that puts tens of thousands of cores and petabytes of remote sensing data at the user's fingertips. He loves working at the boundaries of service, data, and compute scale... Read More →

Thursday October 1, 2020 10:10am - 10:40am PDT
Virtual 4
  Ray and Its Libraries
  • Slides Included Yes