With the increasing amount of various data types machine learning methods capable of
leveraging diverse sources of information have become highly relevant. Deep learning-based
approaches have made significant progress in learning from texts and images in recent years.
These methods enable simultaneous learning from different types of representations
(embeddings). Substantial advancements have also been made in joint learning from different
types of spaces. Additionally other modalities such as sound physical signals from the
environment and time series-based data have been recently explored. Multimodal machine
learning which involves processing and learning from data across multiple modalities has
opened up new possibilities in a wide range of applications including speech recognition
natural language processing and image recognition. From Unimodal to Multimodal Machine
Learning: An Overview gradually introduces the concept of multimodal machine learning
providing readers with the necessary background to understand this type of learning and its
implications. Key methods representative of different modalities are described in more detail
aiming to offer an understanding of the peculiarities of various types of data and how
multimodal approaches tend to address them (although not yet in some cases). The book examines
the implications of multimodal learning in other domains and presents alternative approaches
that offer computationally simpler yet still applicable solutions. The final part of the book
focuses on intriguing open research problems making it useful for practitioners who wish to
better understand the limitations of existing methods and explore potential research avenues to
overcome them