This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I'm working on my own little copy of the classic NVIDIA/Tacotron-2 model (the one hosted at https://github.com/NVIDIA/tacotron2/). I've run into a couple of problems, as happens. And while the already working real-time vocal synthesis is cool and all, I'm more interested in training a robust model with as few aberrations as is feasible via transfer learning. As it is, my attempts at training the model via transfer learning have failed pretty hard so far, as all I've gotten out are jumbled consonants and sibilance that resemble the tone of my dataset without any of the language. My next step is to try and clean up any clips in my dataset longer than 15 seconds, but it's hard to say if that'll help as even after checkpoint_0 the model is totally destabilized.
In general I think it'd be pretty great if there was a bit more information out there on how to set up the model, what hyperparamters need tweaking, and how to make sure you have a good dataset. I feel like I have some knowledge of these things (but definitely not enough).
A few links for those interested:
https://github.com/NVIDIA/tacotron2/issues/223
Github issue log about transfer learning for tacotron2
https://arxiv.org/pdf/1907.07769.pdf
(Also linked at above)
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2
The somewhat more sophisticated NVIDIA repo of tacotron-2, which uses some fancy thing called mixed-precision training, whatever that is. While it seems that this is functionally the same as the regular NVIDIA/tacotron-2 repo, I haven't messed around with it too much as I can't seem to get the docker image up on a Paperspace machine.
If any of you out there have had some success, I'm sure a lot of us could benefit from the knowledge; dataset prep, hyperparameters, whatever you got!
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/VocalSynthe...