非死book開源VoiceLoop，一種在多個揚聲器合成語音的方法

jopen 8年前發布 | 11K 次閱讀 Facebook 語音

非死book開源VoiceLoop，根據開放場景語音文字合成新語音

VoiceLoop是一種神經文本到語音（TTS），能夠在野外采樣的語音中將文本轉換為語音。一些演示樣品可以在這里找到。

快速鏈接

快速開始

按照安裝程序中的說明，然后簡單地執行：

python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth

結果將放在models / vctk / results中。它將生成2個樣本：

The generated sample will be saved with the gen_10.wav extension.
Its ground-truth (test) sample is also generated and is saved with the orig.wav extension.

You can also generate the same text but with a different speaker, specifically:

python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 18 --checkpoint models/vctk/bestmodel.pth

Which will generate the following sample.

Here is the corresponding attention plot:

安裝

Requirements: Linux/OSX, Python2.7 and PyTorch 0.1.12. The current version of the code requires CUDA support for training. Generation can be done on the CPU.

git clone https://github.com/非死bookresearch/loop.git
cd loop
pip install -r scripts/requirements.txt

Data

用于訓練本文中模型的數據可以通過以下方式下載：

bash scripts/download_data.sh

The script downloads and preprocesses a subset of VCTK. This subset contains speakers with american accent.

The dataset was preprocessed using Merlin - from each audio clip we extracted vocoder features using the WORLD vocoder. After downloading, the dataset will be located under subfolder data as follows:

loop
├── data
    └── vctk
        ├── norm_info
        │   ├── norm.dat
        ├── numpy_feautres
        │   ├── p294_001.npz
        │   ├── p294_002.npz
        │   └── ...
        └── numpy_features_valid

The preprocess pipeline can be executed using the following script by Kyle Kastner:https://gist.github.com/kastnerkyle/cc0ac48d34860c5bb3f9112f4d9a0300.

預訓模型

Pretrainde models can be downloaded via:

bash scripts/download_models.sh

After downloading, the models will be located under subfolder models as follows:

loop
├── data
├── models
    ├── vctk
    │   ├── args.pth
    │   └── bestmodel.pth
    └── vctk_alt

SPTK and WORLD

Finally, speech generation requires SPTK3.9 and WORLD vocoder as done in Merlin. To download the executables:

bash scripts/download_tools.sh

Which results the following sub directories:

loop
├── data
├── models
├── tools
    ├── SPTK-3.9
    └── WORLD

訓練

在vctk上訓練一個新的模型，首先使用4的噪聲級別和100的輸入序列長度訓練模型：

python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90

然后，繼續訓練模型使用2的噪聲水平，完整序列：

python train.py --expName vctk_noise_2 --data data/vctk --checkpoint checkpoints/vctk/bestmodel.pth --noise 2 --seq-len 1000 --epochs 90

引文

如果您發現這段代碼在您的研究中有用，請引用：

@article{taigman2017voice,
  title           = {Voice Synthesis for in-the-Wild Speakers via a Phonological Loop},
  author          = {Taigman, Yaniv and Wolf, Lior and Polyak, Adam and Nachmani, Eliya},
  journal         = {ArXiv e-prints},
  archivePrefix   = "arXiv",
  eprinttype      = {arxiv},
  eprint          = {1705.03122},
  primaryClass    = "cs.CL",
  year            = {2017}
  month           = July,
}

許可

Loop has a CC-BY-NC license.

代碼地址：https://github.com/非死bookresearch/loop

論文地址：https://arxiv.org/abs/1707.06588

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/4d93d302

Facebook 語音

非死book開源VoiceLoop，一種在多個揚聲器合成語音的方法

快速鏈接

快速開始

安裝

Data

預訓模型

SPTK and WORLD

訓練

引文

許可

相關資訊

相關經驗

相關文檔