Anyone have any good resources on the Tacotron-2 setup?

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

I'm working on my own little copy of the classic NVIDIA/Tacotron-2 model (the one hosted at https://github.com/NVIDIA/tacotron2/). I've run into a couple of problems, as happens. And while the already working real-time vocal synthesis is cool and all, I'm more interested in training a robust model with as few aberrations as is feasible via transfer learning. As it is, my attempts at training the model via transfer learning have failed pretty hard so far, as all I've gotten out are jumbled consonants and sibilance that resemble the tone of my dataset without any of the language. My next step is to try and clean up any clips in my dataset longer than 15 seconds, but it's hard to say if that'll help as even after checkpoint_0 the model is totally destabilized.

In general I think it'd be pretty great if there was a bit more information out there on how to set up the model, what hyperparamters need tweaking, and how to make sure you have a good dataset. I feel like I have some knowledge of these things (but definitely not enough).

A few links for those interested:

https://github.com/NVIDIA/tacotron2/issues/223
Github issue log about transfer learning for tacotron2

https://arxiv.org/pdf/1907.07769.pdf
(Also linked at above)

https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2
The somewhat more sophisticated NVIDIA repo of tacotron-2, which uses some fancy thing called mixed-precision training, whatever that is. While it seems that this is functionally the same as the regular NVIDIA/tacotron-2 repo, I haven't messed around with it too much as I can't seem to get the docker image up on a Paperspace machine.

If any of you out there have had some success, I'm sure a lot of us could benefit from the knowledge; dataset prep, hyperparameters, whatever you got!

Duplicate Posts

2 posts with the exact same title by 1 other authors

View Details

Author

Account Strength

70%

Account Age

5 years

Verified Email

Verified Flair

Total Karma

775

Link Karma

Comment Karma

711

Profile updated: 5 days ago

scrippington

Subreddit

r/VocalSynthesis

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 4 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/VocalSynthe...