Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

4
Azure Synapse Notebook - pyspark
Post Flair (click to view more posts with a particular flair)
Post Body

Hey Guys,

Trying to figure out why my spark.read.format('json').load(path) is failing. I'm using a wildcard path, and anytime I have enough files that the load takes > 120 seconds, it fails with the following error and no other information:

TaskCanceledException: A task was canceled.

I see nothing but successful jobs in the job execution, when I look at the monitor logs, there's no errors at all, its finding the paths just fine and its loading them, when I look into the spark history, no errors, no nothing. Livy reports everything's fine.

But the task was canceled.

I've gone and set my config.txt file up and changed every possible setting that was pointing at 120s to something greater. I'm using:

spark.rpc.message.maxsize 512

spark.rpc.lookupTimeout 100000

spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout 10000

spark.network.timeout 200000

spark.executor.heartbeatInterval 50000

Author
Account Strength
100%
Account Age
11 years
Verified Email
Yes
Verified Flair
No
Total Karma
40,644
Link Karma
317
Comment Karma
39,809
Profile updated: 3 days ago
Posts updated: 8 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
2 years ago