This thesis presents a scalable generic methodology for microbial phenotype prediction based
on supervised machine learning several models for biological and ecological traits of high
relevance and the deployment in metagenomic datasets. The results suggest that the presented
prediction tool can be used to automatically annotate phenotypes in near-complete microbial
genome sequences as generated in large numbers in current metagenomic studies. Unraveling
relationships between a living organism's genetic information and its observable traits is a
central biological problem. Phenotype prediction facilitated by machine learning techniques
will be a major step forward to creating biological knowledge from big data.