Distributed systems: principles and paradigms I Andrew medical-site.infoaum, Maarten .. This second edition reflects a major revision in comparison to the previous. SECOND EDITION. PROBLEM SOLUTIONS. ANDREW S. TANENBAUM Q: An alternative definition for a distributed system is that of a collection of. Distributed Systems: Principles and Paradigms 2nd Edition . Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services.
|Language:||English, Spanish, Portuguese|
|Distribution:||Free* [*Register to download]|
Edition By Andrew S Tanenbaum Maarten Van Steen Paperback [PDF] [ EPUB] -. DISTRIBUTED SYSTEMS PRINCIPLES AND PARADIGMS 2ND EDITION. Apr 7, Distributed Systems Principles And Paradigms 2nd Edition - [Free] Reviewed Journal - medical-site.info (PDF) Examples of software tools for. Distributed Systems: Principles and Paradigms, 2nd Edition. Andrew S. Tanenbaum, Vrije University, Amsterdam, The Netherlands. Maarten Van Steen.
The examples in the book leave out many details for readability, but the complete code is available through the book's Website, hosted at www. A personalized digital copy of the book is available for free, as well as a printed version through site. About the Authors Maarten van Steen is a professor at the Vrije Universiteit, Amsterdam where he teaches operating systems, computer networks, and distributed systems.
He has also given various highly successful courses on computer systems related subjects to ICT professionals from industry and governmental organizations. Andrew S. Tanenbaum has a B.
Degree from M. Financial and material support should also be mentioned. Thanks to anonymous reviewers are not appropriate. Conflict of Interest Statement Authors will be asked to provide a conflict of interest statement during the submission process. Submitting authors should ensure they liaise with all co-authors to confirm agreement with the final statement.
Abstract The abstract should not exceed words unless absolutely necessary, and should under no circumstances exceed words. The abstract should appear as a single paragraph, which should enable readers to quickly comprehend the thrust of the article prior to reading the article itself.
Objectives, experimental design, principal observations, and conclusions should be succinctly summarized for research articles and techniques. Abbreviations should be avoided. Reference citations within the abstract are not permitted. Please provide main keywords. Keywords Please provide up to 7 keywords. Footnotes to the text are not allowed and any such material should be incorporated into the text as parenthetical matter.
References References should be prepared according to the Publication Manual of the American Psychological Association 6th edition. This means in text citations should follow the author-date method whereby the author's last name and the year of publication for the source should appear in the text, for example, Jones, The complete reference list should appear alphabetically by name at the end of the paper.
Examples of APA references are listed below. Please note that a DOI should be provided for all references where available. Please note that for journal articles, issue numbers are not included unless each issue in the volume begins with page one. Journal article Beers, S. Neuropsychological function in children with maltreatment-related posttraumatic stress disorder. The American Journal of Psychiatry, , — Psychoeducational assessment of students who are visually impaired or blind: Infancy through high school 2nd ed.
Austin, TX: Pro-ed. Internet Document Norton, R. How to train a cat to operate a light switch [Video file]. Retrieved from www. They should be numbered in the list and referred to in the text with consecutive, superscript Arabic numerals. Keep endnotes brief; they should contain only short comments tangential to the main argument of the paper.
Footnotes Footnotes should be placed as a list at the end of the paper only, not at the foot of each page. Keep footnotes brief; they should contain only short comments tangential to the main argument of the paper and should not include references. Figure Legends Legends should be concise but comprehensive—the figure and its legend must be understandable without reference to the text.
The speech production mechanism articulates a series of phonemes nonuniformly, according to an empirical statistical law formulated by George Kingsley Zipf, a linguist [ 26 ], referring to the principle of the least effort from evolutionary biology field: interlocutors try to understand each other using phonemes and words that are easier for production and perception in a particular context.
The knowledge of phoneme and word statistics has been introduced into ASR algorithms long ago, and stochastic speech models like Hidden Markov model HMM [ 27 ] were the prevailing scientific paradigm and represented the state of the art in speech recognition and synthesis community for decades. On the other side, the continuum of acoustic waves reaches the ear of the listener and certain frequencies excite the eardrum, and over the malleus, incus, and stapes, they excite the cochlea, where spectral analysis is performed, based on the movement of the basilar membrane, whose length is about 35 mm [ 17 , 22 , 23 , 25 , 28 ].
The hair cells in the cochlea respond to different sounds based on their frequency so that high-pitched sounds stimulate the hair cells in the lower part of the cochlea, while low-pitched sounds stimulate the upper part of the cochlea [ 28 ]. Thus formed neural impulses are sent to the central auditory system in the brain [ 22 ], and based on spectral differences, the brain recognizes relevant acoustic differences and attempts to recover the string of phones that the original message was composed of, taking into account its language model at the level of morphology, syntax, semantics, and pragmatics.
It can thus be considered that the task of ASR is to reduce the bit rate of, e. However, speech perception, which principally relies on the sense of hearing, is a nonlinear process. As is the case with other human senses vision, taste, touch, and smell , auditory perception of both sound pressure level SPL and fundamental frequency f0, pitch follows the Weber—Fechner law [ 28 ] from psychophysics: a change perceived as linear corresponds to an exponential change in the physical stimulus.
Apart from SPL and pitch, perception of sound is affected by the distribution of sound energy across frequencies, i. This is why common speech features like cepstral coefficients are considered to be located at frequencies rescaled from Hz to mel-scale—MFCC; they are estimated by cepstral analysis from speech frames of 20—30 ms together with their first and second derivatives calculated from several successive frames [ 29 ]. Auditory scene analysis is the process by which the auditory system separates individual sounds in natural-world situations [ 30 , 31 ].
Regardless of whether sound is received by a human ear or a microphone, the incident sound pressure wave represents a sum of pressure waves coming from different individual sources, which can be either human voices or any other sound sources. These sounds usually overlap in both time and frequency. Nevertheless, the human auditory system is usually able to concentrate on an individual sound source at a time [ 23 , 31 ]. While listening and separating one source, the listener constructs a separate mental description for that source.
For example, if a student listens to the teacher, he ignores the noise from LCD projector and a colleague who may be speaking to him; if he switches the focus to his colleague, he cannot actively listen to the teacher anymore. Humans are as successful in sound separation as they are more experienced in real-word situations and they always analyse the incoming signal using heuristic processes. As the ultimate step of the hearing process, human auditory cortex constructs a cognitive representation of the received sound wave.
Without the cognition step, sound waves coming to the ears are not perceived. Heuristic analysis is based on ir regularities in the sum of underlying sounds. Individual sounds differentiate from each other in at least one of the following dimensions: time, space, and frequency spectrum [ 28 , 31 ].
Temporal and spatial sensations in the human auditory system are presented in more details in [ 32 ]. In a specific environment, binaural hearing enables the localization of sound sources, which is easier, but also often more important, in the horizontal plane where human ears are positioned than in the vertical plane. The spectrum of frequency components can determine the perceived pitch, timbre, loudness, and the difference in the spectra of sounds received by both ears enables the localization of sound sources [ 23 , 31 , 32 ].
Pitch is related to the fundamental frequency f0 in periodic sound waves such as musical tones or vowels in speech; their spectrum consists of f0 and its harmonics. Temporal variation of f0 results in melody in music and intonation in speech. Timbre represents a specific distribution in the intensities of f0s and its harmonics in the spectrum.
Two renditions of the same tone from two different musical instruments, having the same f0, will have different timbres due to the difference in the relative intensities of particular harmonics the spectral envelope , and as a result, they will sound different [ 22 ].
If a sound spectrum does not contain just harmonic tones f0s and their harmonics , the spectrum is not discrete; sound spectrum is rich with frequency components in parts or in the entire frequency range of the human auditory sense.
Such sounds, with a spectrum that is more or less continuous, are much more frequent in nature e. Acoustic signals are received by a listener and transformed into linguistic and nonlinguistic categories, but it is not known exactly how.
There is ongoing research on neurophysiology of speech communication using the latest advances in invasive and noninvasive human recording techniques, with the aim to uncover fundamental characteristics of cortical speech processing [ 16 ].
The research team in question has studied phonetic feature encoding and mechanisms of noise robust representation of speech in auditory cortex based on the evidence that humans and animals can reliably perceive behaviourally relevant sounds in noisy and reverberant environments.
Neuro-inspired computational models try to provide progress in artificial deep neural network DNN performance, based on better understanding of the representation and transformation performed by these models. A case study in ASR given in [ 33 ] attempts to identify the mechanisms that normalize the natural variability of speech and compares these mechanisms with findings of speech representation in the human auditory cortex.
The aim is to compare DNNs with their biological counterparts, identify their limitations, and reduce the performance gap between biological systems and artificial computing. An algorithm aimed at focusing on one speaker in a group of many speakers based on deep attractor network is proposed in [ 34 ], based on similar principles. It has been shown that switching attention to a new speaker instantly changes the neural representation of sound in the brain.
An adaptive system should change the sensory representation in real time to implement novel, task-driven computations that facilitate the extraction of relevant acoustic parameters. Human listeners have a remarkable ability to understand quickly and efficiently the world around them based on behaviour of known sound sources.
Moreover, they are able to pay attention and focus on the meaning of speech of a particular speaker. Attentional focus can be integrated into HCI dialogue strategy [ 35 ], while data related to human cognitive effort can be used in postprocessing and improvement of the performance of ASR systems [ 36 ].
Humans are able not only to separate one speaker or concentrate only to one sound source but also to group more sound sources and hear, e. Concurrent and sequential grouping processes are described in more details in [ 37 ]. The role of the nonlinearities in DNN in categorization of phonemes by their nonuniform and nonlinear warping of the acoustic space are studied in [ 38 ], as well as the way perceptual invariant categories are created. Biological neurons are able to dynamically change the synaptic efficacy in response to variable input conditions.
It is called synaptic depression and when it is added to the hidden layers of a DNN trained for phoneme classification, ASR system becomes more robust to noisy conditions without explicitly being trained for them. The results from [ 39 ] suggest that more complete neuron models may further reduce the gap between the biological performance and artificial computing, resulting in networks that better generalize to novel signal conditions.
Engineering vs. Linguistic Point of View to NLP as a Typical AI Topic The mechanism of speech production and the physical component of sound perception are relatively well-studied topics [ 22 , 31 ], while cognitive aspects of speech communication still represent a widely open research area. All aspects of human-machine speech communication that are related to linguistics, such as natural language processing NLP , cognitive sciences—neurolinguistics, and dialogue management see Figure 1 , represent great challenges to the scientific community.
In the recent past, the development of speech technology and spoken dialogue systems has gained most momentum from the engineering disciplines, through the possibility of automatic learning from vast quantities of data, in terms of development of computational facilities, complex learning algorithms, and sophisticated neural model architectures addressing particular phenomena and problems of cognitive linguistics.
At the same time, cognitive speech sciences mostly remain outside of the scope of the immediate interest of engineering disciplines relevant to speech technology development. Nevertheless, the knowledge in these areas overlaps in the concept and scope with machine learning, which, inspired by neurosciences, has brought about progress not only in human-computer interaction and computational linguistics but also in the area of spoken language processing, which lies in their intersection.
This is indicated in Figure 1 , which also shows a relatively wide gap between cognitive sciences neuroscience and psycholinguistics on one side and predominantly engineering disciplines on the other. As regards the role of machine learning in the development of speech technology, it has offered a powerful alternative to models dependent on linguistic resources and modules performing particular linguistically motivated subtasks.
Linguistic resources such as dictionaries and speech databases are typically quite expensive and time-consuming to collect and annotate, while the development of modules that compose a speech technology system requires deep domain knowledge and expert effort. In the last two decades, some of the tasks performed by rule-based systems or simpler machine learning methods have, one by one, been overtaken by neural networks. Namely, in the case of acoustic speech recognition, neural networks have been shown to outperform hidden Markov models HMMs in acoustic modelling [ 40 ] but have also outperformed classical N-gram language models in terms of generalization, using either architectures based on long short-term memory LSTM neurons [ 41 ] or recurrent neural networks RNN [ 42 ].
Solutions based on neural networks have been shown to reach human parity in tasks as complex as casual conversational speech recognition [ 43 ]. In combination with a range of data-synthesis techniques for obtaining large quantities of varied data for training, it is now possible to obtain an end-to-end ASR capable of outperforming state-of-the-art pipelines in recognizing clear conversational speech as well as noisy one [ 44 , 45 ].
They have also been used in multimodal speech recognition, i. The task of speech synthesis is a more language-dependent one, and in that it is more challenging since it aims to reintroduce the redundancy which is lost when speech is converted into text, and to do it in such a way that, among a multitude of prosodic renditions of a particular utterance, it produces one that the listener will consider acceptable in a given context.
Here again, neural networks have shown to overperform classical models working on parameterized speech such as HMMs [ 47 , 48 ] in acoustic modelling, and they have also been employed for prosody modelling [ 49 ] as well as modelling of acoustic trajectories [ 50 ].
Neural networks have also addressed the problem of a somewhat muffled character of synthesized speech due to the use of a vocoder, by performing synthesis of raw speech waveforms instead [ 51 ]. Finally, to overcome the need for sophisticated speech and language resources that require deep domain expertise, a range of end-to-end architectures were proposed, with the ultimate end that the system should be trained on pairs of text and audio, exploiting the capability of neural networks to automatically develop higher-level abstractions [ 52 ].
The flexibility of such a powerful data-driven approach in comparison with classical speech concatenation synthesizers has also brought significant progress in the area of multispeaker TTS and speaker adaptation [ 53 — 55 ] as well the ability to conform to a particular speech style or emotion [ 56 ].
This is particularly relevant as it coincides with the emergence of applications such as smart environments, virtual assistants, and intelligent robots, demanding high-quality speech synthesis in different voices and different styles and conveying different emotional states of the perceived speaker [ 57 ].
Other language technology tasks have also been successfully addressed by neural networks, such as question answering [ 58 ], text classification [ 59 , 60 ], machine translation [ 61 , 62 ], and sentiment analysis [ 63 ]. Neural networks have also been used as a powerful linguistic tool, for modelling sentence syntax [ 64 ] or exploring particular linguistic phenomena such as establishing word representations in vector spaces [ 65 ].
However, rather than providing a decomposition of the problem and a clear analytical insight into it, neural networks provide an alternative, data-driven point of view, and thus cannot be considered a classical tool of theoretical linguistics. On the other hand, their performance in solving these problems justly makes neural networks state of the art in the development of speech technology.
Progress in Speech Recognition and Synthesis, as well as Dialogue Systems Apart from automatic speech recognition ASR and text-to-speech synthesis TTS , a human-machine speech dialogue system also includes a dialogue management module with corresponding dialogue strategies and language technologies for spoken language understanding SLU and spoken language generation SLG , as illustrated in Figure 4.
Figure 4: Components of a human-machine speech dialogue system. They have been developed with an effort to combine interdisciplinary knowledge from different areas such as linguistics, acoustics, computer science, and mathematics. Signal processing engineers usually have integrating roles among linguists from one side and mathematicians from the other side. Progress of Automatic Speech Recognition Systems Research and development of ASR systems began in the s in Bell Labs, with simple digit recognition systems, and since then the recognition tasks have become more complex—from the recognition of isolated digits, then isolated words, then continuously spoken words in a silent environment, up to the recognition of spontaneous speech in a noisy environment.
Consequently, the complexity of the algorithms used also increased drastically. A brief review of historical development of ASR can be found in [ 66 ]. There were three important moments in the development of ASR systems: introduction of mel-frequency cepstral coefficients [ 67 ], introduction of statistical methods hidden Markov models HMM with Gaussian mixture models GMM [ 68 ], and introduction of deep neural networks DNN [ 69 ].
This development was also supported by the technological development in the computer industry as well as the increase in the amount of data available for training these systems. For a small database, such as English Broadcast News about 30 h of training data , the difference in word error rates WER was not significant, but for the Switchboard database, which is bigger about h of training data , the difference became substantial.
Further improvement of DNN was based on better optimization, new activation functions, new network architectures, new speech preprocessing methods, and leveraging multiple languages and dialects [ 70 ].
One of the important findings was that layer-by-layer pretraining using restricted Boltzmann machines RBM is not obligatory and that backpropagation algorithm is sufficient for training in case of a large quantity of available training data as well as a large number of units in the hidden layers.
Additionally, LeCun et al. The next big step was a complete elimination of HMM from the model. Graves and Jaitly in [ 72 ] reported a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional long-short term memory LSTM recurrent neural network architecture and the connectionist temporal classification CTC objective function.
Such a direct mapping of an audio signal into a grapheme sequence allows easy application of the system on new languages such as Serbian [ 73 ]. Inspired by CTC, Povey at al. This method was also successfully applied to Serbian [ 75 ]; i.
Progress of Speech Emotion Recognition Since humans are not always rational and logical beings—emotions play very important aspects in acceptance of new products and technologies [ 76 ].
The earliest attempts to recognize speaker emotional state on the basis of voice characteristics date back to the s [ 77 ].
The initial motive for this research direction was the adaptation of an ASR system to emotionally stressed speech [ 78 ], but another motive appeared with the development of spoken dialogue systems, where it was useful to modify the dialogue strategy based on, e.
There are a number of emotions that can be easily represented in the activation-evaluation space [ 80 ], but classification of such a large number of emotions is difficult. Hence, classification space has been reduced to neutral and 6 archetypal emotions: anger, disgust, fear, joy, sadness, and surprise, which are the most obvious and distinct emotions [ 80 ]. One of the important steps in the design of a speech emotion recognition system is the extraction of features that efficiently discriminate between emotions independently of lexical content, speaker, and acoustic environment.