Development of Text to Speech System for Yoruba Language
Speech synthesis is the artificial production of human speech, a computer system used for this purpose is called a speech synthesizer and can be implemented in software and hardware. A text-to-speech (TTS) system converts normal language text into speech. In the last few years, this technology has been widely available for several languages for different platform ranging from personal computer to smart devices, but for Yoruba language which is being spoken by over 30 million people out of 150 million Nigerian populace have not been fully developed. Therefore, there is need to develop such system. There are several techniques that can be used these include Formant synthesis, Articulatory synthesis, Concatenative synthesis etc. but for this work, concatenating method was used. Concatenative synthesis based on concatenation i.e. stringing together of segments of recorded parts of speech. Yoruba syllables served as the basic units of concatenation and written in C# programming language implemented in .NET Framework,Yoruba syllable database was merged with corresponding recorded Yoruba sound syllable database so that there can be meaningful sound pronunciation
Text-To-Speech Synthesizer for Wolaytta Language
This paper describes the first Text-to-Speech (TTS) system for the Wolaytta language, using speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying Residual LPC technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the steps and their components are discussed. Finally, the performance of the system is measured and the quality of synthesized speech is assessed in terms of intelligibility and naturalness. The test results indicate that the majority of the words are recognizable. The overall performance of the system is found to be 78%. When it comes to the intelligibility and naturalness of the synthesized speech, it is measured in MOS scale and the overall intelligibility and naturalness of the system is found to be 3.17 and 2.77 respectively.
A Generalized Approach To Amharic Text-to-speech(TTS) Synthesis System
A text-to-speech (TTS) synthesis system converts natural language text into speech. However, written text of a language contains both standard words (SWs) and non-standard words (NSWs) like numbers, abbreviations, synonyms, currency, and dates. These NSWs cannot be detected by an application of “letter-to-sound” rule.This work try to produce Amharic TTS system, which handle both standard words(SWs) and Non-standard words (NSWs) of Amharic language. The model described in this work has two major parts: Natural language processing (NLP) and Digital signal processing (DSP). The NLP handles the text analysis (transcription of the input SWs and NSWs) and extraction of the speech parameters. The DSP further enable to generate the artificial speech. Finally, the performance of the system shows that on the average 73.35% words both SWs and NSWs correctly pronounced. In addition, an assessment of intelligibility and naturalness of synthesized speech using MOS testing techniques results a score of 3 and 2.83, respectively.The experiment shows a promising result to design an applicable system that synthesis both SWs and NSWs for unrestricted text of a language.
Text to Speech System
Speech and spoken words have always played a big role in the individual and collective lives of the people. Speech represents the spoken form of a language and is also one of the important means of communication.Synthetic speech may be used in several applications like telecommunications services, language education, aid to handicapped persons, fundamental and applied research etc. TTS has to face many challenges during the process of conversion of text to speech. The most important qualities expected from speech synthesis system are naturalness and intelligibility. Wars have been won, peace agreements have been made because of the magical words of a few who knew how to give life to their words.
Malay Text to Speech Synthesis System
Recently, human computer interaction system which involves speech recognition, synthesis etc. experiences tremendous growth, resulting in many applications being developed and commercialized. For instance, Microsoft recently launched the Microsoft Office that process the capability to pronounce (or read aloud)the input text using the speech synthesis engine. Indeed, speech synthesis is of important assisting human in various areas such as telephone speech, application in cars, public information systems, education assistance tools, email reading etc. Text to Speech (TTS) system is speech synthesis tools that capable of pronouncing any input raw texts aloud. This book covers the review of speech synthesis system and the design process for Malay Text to Speech System utilizing Festival Speech synthesis. It provides the detail of the database design process, system implementation, testing and evaluation. The target audience for this book would be those who has speech processing research background or those who are interested to know further about speech synthesis system.
Development of Speech Audiometry
A comprehensive hearing evaluation should include an assessment of hearing that uses test stimuli similar to the auditory information conveyed during speech communication. Speech audiometry refers to a battery of testing procedures and protocols aimed at estimating an individual‘s ability to hear and understand speech. Recently, efforts have been made to create high-quality materials for Speech Reception / Recognition threshold (SRT), and other types of speech audiometry, in a variety of languages. There are various speech audiometry material developed in different languages since India being multilingual in nature .However, there are still many languages for which there are no materials widely available, such as the Tulu language. Tulu language is considered as one of the five Dravidian languages of South India. Though, Tulu is confined to a small region of India, it possesses a very rich vocabulary and has been considered as a highly developed language by linguists. Testing in the native language is preferred for the results to be accurate and relevant. The purpose of this project was to develop and digitally record materials in Tulu for measuring the SRT.
Language Modeling Approaches for Improving Tamil Speech Recognition
This thesis proposes a new approach to improve the performance of Tamil speech recognition using language models. The main contribution of this thesis is the development of language models to capture co-occurrence patterns of partially free word order languages like Tamil. The models designed used sub-word units such as phonemes, syllables and morphemes as basic components of the language model. In addition the thesis explains the use of various language models at different levels of error correction. Language models based on different sub-word units, sub-word unit features and contexts designed to capture the characteristics of the Tamil language were used in this work to improve error correction rate of Tamil speech recognition system. The language models described in this work are not word dependent, but based on sub-word units like phoneme, syllable and morpheme resulting in essentially capturing vocabulary independent linguistic co-occurrences of the language under consideration. Therefore the language model based error correction discussed in this work performs a step-by-step sub-word unit-based error correction, which is also vocabulary independent.
A Punjabi to English Machine Translation System for Legal Documents
The book is a research work that shows the development of a Machine Translator that converts a text written in one natural language to another. My book contains examples for conversion of Punjabi text to English. The transliteration of Punjabi text is also given in English to increase its readability. It is very helpful for a researcher to design a Translator from any natural language to another natural language.
Speech Recognition System
Speech recognition systems have been applicable in wide areas as various speech recognition methodologies, techniques and tools have been developed and implemented to generate a natural and intelligible speech. In this regard, this work attempts the possibility of developing a prototype speech recognition system for Sidama language using Hidden Marcov Model. The study has conducted extensive study on the language features, the components, speech recognition tools; the techniques used in speech recognition design, and identified those component that are dependent on the characteristics of language. Finally this work has showed a working prototype speech recognizer for the language, tested the performance of the system and compared its accuracy, and recommended measures for similar researches and projects. This work, therefore, will be useful to researchers, Speech application developers, Educators and other individuals or institutions working on similar projects.
Part of Speech Tagging for Pashto
This book presents the first ever rule-based part of speech tagging for Pashto language. In natural language processing, part-of-speech tagging plays a vital role. It is a significant pre-requisite for putting a human language on the engineering track. Before developing a part-of-speech tagger, a tagset is required for that language. Initially, a tagset is created according to syntactical properties that contains 54 tags for Pashto language. A simple architecture is proposed for Pashto part of speech tagger. The architecture contains a tokenizer, a lexicon and rules for disambiguation and new words. The lexicon contains words with their tags. The lexicon will grow with each new word, when more and more text is tagged. The above architecture is implemented and tested on real world data. The accuracy was low in the beginning because a very limited lexicon and rules were present. Text is tagged with this tagger and corrections of new words are done manually which result in the growth of both lexicon and rules. When the lexicon reached to 100,000 words and rules grew to 120, the accuracy became 88%. The accuracy will further increase with the increase of words in the lexicon and rules.
Evaluation of Development Communication Content of Yoruba Newspapers
This study investigated the Yoruba language newspaper, particularly, in relation to their development communication content. The study adopted, principally, the content analysis method, and as a supplement, stylistic analysis. The three newspapers content-analysed were: Gbohungbohun, Iroyin Yoruba and Isokan. Yoruba language newspapers are weeklies. A total of 125 editions of the newspaper, spanning eleven years (1986-1996) were sampled. Stylistic analysis of the newspapers was done at the following levels: the graphitic/graphological, the grammatical, the lexical and the semantic. The following are some of the findings: Development-oriented items constituted 27.4% of the entire editorial content of the newspapers analysed. Among other measures, 17.3% of these stories were placed on the front page; 16.4% the back page; and 66.3% the inside pages. The treatment given to development-oriented stories in the newspaper was considered to be fair. The newspapers disseminated their development messages mostly through the news genre. This genre constituted 66.9% of all development-oriented items content-analysed.
A Part of Speech Tagging Model for Albanian
With the enormous growth of the digital information, it is necessary to find advanced ways to process it. The goal is to enhance information retrieval, information extraction and natural language processing. One of the most complicated processes is text mining which deals with finding high quality information from text. This book presents a statistical part-of-speech tagging model for Albanian. The training, testing and evaluation processes are done with Apache OpenNLP tool. The tagging process is performed based on a basic and a large tagset. The experiments are performed on a tagger model trained with corpus composed of a standard Albanian text written by Albanian authors. The tagger model is tested using a cross-validation and a sample text. Results showed that the accuracy of the trained tagger model in real testing environments was about 70%, and up to 98% when the environment settings were optimized for the best accuracy. It was also noticed that the overall accuracy for this model depends on the number of training tokens, level of grammatical and morphological complexity in text and special cases in language expressions.
Multi-Purpose Speech Recognition and Speech Synthesis System
A multi-purpose speech recognition and speech synthesis system which includes applications that employ Speech Recognition (speech-to-text) and Speech Synthesis (text-to-speech) technologies. The input of the system is a speech signal, and it can also be typed text or graphical triggers. The output can be a signal signal or a text, along with graphical user interface forms. The applications are dictionary, calculator, search engine, movie guide, movie search, news reader, address book, image gallery and a weather forecaster. This book includes explanations of Speech Recognition, Speech Synthesis, and the functionality of the system and its applications. It also contains an explanation of how a special hand-gesture-controlled cursor and its hardware were developed. The system was developed to be a graduation project of Hussein Mohsen, Karim Jahed, and Marwan Fawaz at the Lebanese American University (LAU), Beirut, Lebanon, for Spring 2011 semester. The project was supervised by Dr. Nashat Mansour, Professor of Computer Science and Assistant Dean of School of Arts and Sciences at LAU-Beirut.
Tone Labelling Algorithm for Sesotho
Studies have shown that text-to-speech systems need detailed prosodic models of a language in order to ideally sound natural to native speakers of the language. A text-to-speech system developed for Sesotho needs to have tone implemented in it since Sesotho is a tonal language which uses pitch variations to distinguish lexical and/or grammatical meaning. In order to implement tone for a language such as Sesotho, it is necessary for a tone modeling algorithm to receive as input the tone labels of the syllables of a word. This allows the algorithm to predict the appropriate intonation of the word. The aim of our study is to improve a basic tone labeling algorithm that predicts tone labels using three Sesotho tonal rules. The application of this algorithm is restricted to polysyllabic verb stems. The research study involves implementing an extended tone labeling algorithm that implements four additional Sesotho tonal rules and extends its application to all the other parts of speech.
