paint-brush
Threaded Tasks in PySpark Jobsby@rick-bahague
2,702 reads
2,702 reads

Threaded Tasks in PySpark Jobs

by Rick Bahague2mAugust 3rd, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are circumstances when tasks (Spark action, e.g. save, count, etc) in a PySpark job can be spawned on separate threads. Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. An important reminder is to set set('spark.scheduler.mode','FAIR') in the sparkContext.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Threaded Tasks in PySpark Jobs
Rick Bahague HackerNoon profile picture
Rick Bahague

Rick Bahague

@rick-bahague

Free & Open Source Advocate. Data Geek - Big or Small.

Learn More
LEARN MORE ABOUT @RICK-BAHAGUE'S
EXPERTISE AND PLACE ON THE INTERNET.
L O A D I N G
. . . comments & more!

About Author

Rick Bahague HackerNoon profile picture
Rick Bahague@rick-bahague
Free & Open Source Advocate. Data Geek - Big or Small.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite