2 years, 7 months ago Captain C-3PO by Jeff Nickel If you dealt with record of audio — whether it be a personazhny postscoring for game or an announcer's nachitka for the video — for certain noticed that that put this expensive. It is important to make everything correctly from the first to cut down expenses. The same and with localization of audio: each error is multiplied by quantity of languages. In this article we will share councils how to interact with recording studios and services for localization how to optimize and accelerate process, to reduce risks, and at the same time and expenses on localization of a sound. And it is unimportant, you will order these services from us in Alconost or in other company — the knowledge of all reefs precisely is useful to you.
2 years, 8 months ago
My name is Pyotr, I am sound designer, I work in studio on the InSomnia project. In this article I would like to tell about process of work on a postscoring of heavy armor briefly. Here so the final result looks:
In the articles about transition to the Russian K1986BE92QI microcontroller I time told about generation of a sound means of the microcontroller. Then before me the task only to reproduce data melted. For creation of these data obtained from MIDI files very exotic methods, for example, as in this article were used. Yes, similar methods have the right to life if it is required to obtain data for reproduction few times in life. But as I rather often face tasks when on the controller it is necessary to receive rather difficult sound, or a sound — only an additional option, a task to transform MIDI files by such exotic methods, to become very nontrivial. In this small series of articles I set for myself the task to create (and for one and to tell in detail about creation process) the universal program for the MIDI conversion of files to a format, acceptable for the microcontroller, and also generating all these initialization, necessary for the microcontroller.
Implementation of the main functionality of the program will become a result of this article: creation of arrays a note duration, created from the MIDI file. Who became interested — I ask under kat.
2 years, 9 months ago
The sound, as well as color, people perceive differently. For example, the fact that it seems too loud or low-quality to one can be normal for others.
For work on Yandex. Music it is always important to us to remember different subtleties which are concealed in itself by a sound. What is volume as it changes and what depends on? How sound filters work? What noise happen? How the sound changes? As people perceive it.
We learned about all this very much, working on our project, and today I will try to describe on fingers some basic concepts which are required to be known if you deal with digital sound processing. In this article there is no serious mathematics like fast Fourier transforms and other — it is simple to find these formulas in a network. I will describe an essence and sense of things which it is necessary to face.
As reason for this post you can consider that we added to applications of Yandex. Music an opportunity to listen to tracks in high quality (320kbps). And you can not consider. So.
In many cases the problem of obtaining (calculation) of a range of a signal looks as follows. There is ATsP which with a sampling rate of Fd will transform the continuous signal arriving on its input during T time to digitized samples — N pieces. Further the array of counting moves in a certain applet which issues N/2 of some numerical values (the programmer who dragged away from the Internet wrote an applet, assures that it does Fourier transform).
To check whether the program correctly works, we will create an array of counting as the sum of two sinusoids of sin(10*2*pi*x) +0,5*sin(5*2*pi*x) and we will palm off on an applet. The program drew the following:
fig. 1 Diagram of temporary function of a signal
fig. 2 Diagram of a Range of a Signal
On the diagram of a range there are two sticks (harmonics) of 5 Hz with an amplitude of 0.5 V and 10 Hz — with an amplitude of 1 V, all as in a formula of an original signal. Everything is excellent, the programmer well done! the Program works correctly.
It means that if we give on ATsP input a real signal from mix of two sinusoids, then we will receive the similar range consisting of two harmonics.
Total, our real measured signal, the lasting 5 sec., digitized ATsP that is provided by discrete counting has a discrete non-periodic range.
From the mathematical point of view — how many errors in this phrase?
Now the administration solved we decided that 5 seconds are too long, let's measure a signal for 0.5 sec.
2 years, 10 months ago
I think that for many people the audiocompression with losses reminds magic black box where surprisingly with difficulty using mathematical alchemy are compressed given due to loss of redundant information, plokhorazlichimy or unheard by ear of the person, and, as a result, some decline in quality of record. However at once to estimate importance of such losses and to understand their essence not really simply. But today we will try to find out, in what there business and thanks to what in general similar process of data compression in tens times is possible …
It is time to remove veil, to open door and personally to look at the mysterious algorithm exciting minds and hearts welcome to session with exposure!
2 years, 10 months ago
The quantity and volume of materials on PureData gradually grows, means time has come to collect them in one place. This, some kind of, post-contents of articles written by me. Many principles and ideas described in them are suitable not only for PD, but also for any MAX-like language. Success in experiments.
It became interesting, how well Microsoft Speech is able to recognize the speech. As source for recognition I have decided to take audio stream of negotiations of police from the site youarelistening.to.
2 years, 10 months ago
Recently Yandex has released the experimental Conversation application which helps to communicate to deafs and people hard of hearing. Now there passes the international week of deafs, and we have decided that it is very good occasion to tell about our application, about that, why we did it and as it has turned out so that Yandex has supported our idea. And also how process of work on prototype for hakaton differs from release of full-fledged product.
Last fall in MFTI where I studied, on basic chair of Yandex to us gave the course "Creation of New Internet Products". He reflected as certain startapersky practical work within which it was necessary to think up something that successfully would solve the existing problem by means of technologies of Yandex. We with several of my classmates have thought that communication of the people who are switched off from usual communication by voice with other hearing world – task which approaches under such criteria. According to World Health Organization, 10% of inhabitants of Earth have problems with hearing, 1,5-2% from them suffer from heavy violations. In their Russia — something Would be healthy to make 2,2 million that could help these people with everyday life.
2 years, 10 months ago
CMU Sphinx now is the largest project on recognition of the human speech. The tools include the following programs and libraries:
Pocketsphinx — the small program which accepts any acoustic models on input, grammars and dictionaries, and also sound flow (either the sound file, or itself takes flow from microphone). On output the recognized text turns out. It is written on C, works quickly.
Sphinxbase — library necessary for work of Pocketsphinx
Sphinx4 — flexible library for recognition, is written on Java.
Sphinxtrain — the program for training of acoustic models.
For work from CMU Sphinx it is important to remember some definitions and to understand their differences.
The dictionary — is the file in which are written lexemes and phonemes (the word and its transcription) are compared. For example, calculator (k ay ll k u ll ja t ay r). It is necessary for conversion of the phonemes recognized by acoustic model in lexemes.
The grammar — is formal rules which describe simple rules of creation of sentences. The lexemes received on the previous step try to be compared with grammar and if it is successful, the result is output.
The language model — is statistical model of language. It describes probabilities of words and their combinations. Thus recognition of lexemes — is maximizing plausibility of the recognized phrase.
Than language is more difficult, the rules and vocabulary size are more extensive, the recognition accuracy is worse. Therefore, for minimization of error, it makes sense of creation of the simplified rules which will describe specific objective.