see more

Threaded Tasks in PySpark Jobs by@rick-bahague

2,702 reads

2,702 reads

Threaded Tasks in PySpark Jobs

by Rick Bahague2mAugust 3rd, 2019

Read on Terminal Reader

Read this story w/o Javascript

Too Long; Didn't Read

There are circumstances when tasks (Spark action, e.g. save, count, etc) in a PySpark job can be spawned on separate threads. Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. An important reminder is to set set('spark.scheduler.mode','FAIR') in the sparkContext.

Companies Mentioned

Mention Thumbnail

Mention Thumbnail

featured image - Threaded Tasks in PySpark Jobs

Bosch

L O A D I N G
. . . comments & more!

About Author

Rick Bahague@rick-bahague

Free & Open Source Advocate. Data Geek - Big or Small.

Read my stories Learn More I do data science and engineering with PySpark, Python & Open Source tools.

TOPICS

purcat-img

programming #pyspark #python #big-data-processing #speed-up-coding #threaded-tasks #latest-tech-stories #threading-tasks #parquet-files

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Read on Terminal Reader

Read this story w/o Javascript

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas