Rate what subscribers hear: Voice quality tests

Mean Opinion Scores (MOS) descripe the average between perfect pitch and tone deaf

Mean Opinion Scores (MOS) describe the average between perfect pitch and tone deaf

Every September tensions are running high at Germany’s mobile network operators. At this time of year the special interest magazine “Chip” publishes its annual mobile network test. A good score in this self proclaimed “toughest mobile network test in Germany” is a boost to the sales departments, a bad result a public relations problem. Even in the age of smartphones and LTE, voice quality is a benchmark in this trial, as Chip stated in 2013: “To most users voice calls are more important than LTE.”

Rating of voice quality is a subjective matter

A conclusion that is confirmed by Qosmotec’s executive board member Dr. Dieter Kreuer: “No matter how important data rates have become, the main purpose of mobile networks is to enable voice calls.” But testing voice quality in fully automated test systems is something easier said than done. Unlike human beings automated test systems cannot distinguish whether a sound pattern is interrupted by signal loss or because a speaker has finished his sentence. But how to measure the quality of sounds?

Parameters of a PESQ voice quality test in LTS

In a voice quality test case users can select any voice sample and specify a MOS limit, which is relevant for the test assessment

The acurate ear of a musician is more likely to have perfect pitch, than somebody, who has stayed too long in the discotheque the evening before. And exactly this is the key for the so called “Mean Opinion Score” (MOS), the most important performance indicator for the rating of voice transmission. Based on a standardized algorithm, it averages the opinion that you would get, when a certain number of representatively selected people would listen to the same sound sample. Each of them would have a different opinion, but in total, their rating would give a good picture on the sound quality. It is also distinguished between a referenced rating, e.g., direct comparison between the original and the transmitted sound, and the unreferenced rating, which means the rating of the transmitted sample without knowledge of the original source.

Standardized solutions for voice quality tests

Voice quality testing is an integral part in Qosmotec’s test automation system LTS. Dr. Kreuer explains how Qosmotec has addressed the issue: “To make voice quality tests in LTS possible, we integrated the algorithm for tests according to the PESQ standard.” Perceptual Speech Quality (PESQ) is the standard that has been defined by the ITU Telecommunication Standardization Sector (ITU-T) to rate voice as described above. In LTS, we use referenced voice testing, so that results really reflect the transmission via the mobile network and are not influenced by badly recorded voice samples on the input side. Basically, the tester can use any voice sample in any language for testing. For example, the PESQ mechanism can also be used to compare voice samples with each other. If a tester wants to check for example, wether an announcement is correct (or if it is spoken in the correct language),he can use PESQ to compare the received message with the expected announcement. If those differ from each other, this will lead to a bad Mean Opinion Score. For real network quality tests, we recommend to use a sample that comprises various languages as well as combinations of men and women speakers, which is ideal to provide a Mean Opinion Score with high granularity.

Results of a PESQ voice quality test in LTS

LTS lists all voice quality KPIs and depicts them in graphics.

Challenge on the hardware setup

However, integrating a PESQ algorithm into the LTS software alone does not solve all issues concerning voice quality tests in automated end-to-end systems. The hardware setup also has to be taken into consideration. With commercial handsets, it is almost impossible to execute automated tests for voice quality, because analogue voice would have to be digitalized first. This in turn leads to an additional loss of quality that is not caused by network transmission and is therefore irrelevant for testing. For this reason, Qosmotec uses industrial terminals for voice testing instead of commercial phones. These have a digital sound interface and therefore allow recording the voice as it is received.

Evolving technologies require new rating mechanisms

To ensure Qosmotec’s customers can avoid being slated in the press for their speech quality, Kreuer and his team are making LTS fit for the future of voice testing: “Testing various facilities of voice, for example, High Definition Voice or Voice over LTE, which is going to come soon, are the next issues for voice tests. Our task is to provide easy and effective mechanisms to test these technologies. We are currently integrating the new POLQA (Perceptual Objective Listening Quality Analysis) mechanism which adopts PESQ for VoIP and is used to test voice transmission over packet switched standards.”

Rate what subscribers hear: Voice quality tests

Mark Hakim