On August 14th, I posted a link to the site ( ) on Reddit. We must decide as a society how we want to deal with this fact. If I can build it in a few weeks, there are surely many engineers more talented than I that could do it better faster. All of the technology was readily available. I mention this to emphasize that creating this model was not difficult. Getting the site running smoothly took another month. I needed to learn a few technologies, having never built a website myself before. The next step was to build a tech demo that would let users generate audio from the model using a simple front end - type and listen.īuilding the site was actually harder for me than developing the speech model. After about four weeks of learning how the components worked, experimenting with data processing techniques, and training a few different models, I had a pretty convincing speech generator. I started working on the project part-time at the end of June.
I experimented with a few voices, eventually settling on psychologist, author, and podcaster Jordan Peterson for a few reasons, not the least of which being that I am a fan of his work and style of intellectual discourse, even though I don’t agree with all of his viewpoints.
I delved a bit deeper into the literature involving the state-of-the-art models and decided that a good way to learn about how these models worked would be to glue together a handful of existing open-source components to produce a realistic text-to-speech demonstration of a highly recognizable voice. I decided to explore the space from an entrepreneurial perspective to see what markets might exist around speech synthesis models as content creation tools.
Youtube ifart in your general direction professional#
I grew up a media junkie, and though my academic and professional career wound up taking an engineering path, I have remained passionate about artistic storytelling my whole life. I was interested in the application of speech synthesis to the creative industry - in particular, voice generation for games and films. I was amazed by the quality demonstrated in the results of the WaveNet paper when it was published, and began casually following the literature, though I wasn’t actively contributing to it. It was now possible to generate artificial speech indistinguishable from human speech, as judged by native speakers. The pace of advancement in speech quality produced by neural network models following the deep learning revolution in 2012 was impressive, and with the release of the WaveNet paper by Google’s DeepMind in 2016 it was clear that the uncanny valley had been crossed. Two months ago, I started work on a side project experimenting with artificial speech synthesis using recently published machine learning methods.