Multimodal data, comprising interdependentunstructured text, image, and audio data that collectively characterize the same source (with video being a prominent example), offer a wealth of information for strategy researchers. We emphasize the theoretical importance of capturing the interdependencies between different modalities when evaluating multimodal data. To automate the analysis of video data, we introduce advanced deep machine learning and data fusion methods that comprehensively account for all intra- and inter-modality interdependencies. Through an empirical demonstration focused on measuring the trustworthiness of grassroots sellers in live streaming commerce on Tik Tok, we highlight the crucial role of interpersonal interactions in the business success of microenterprises. We provide access to our data and algorithms to facilitate data fusion in strategy research that relies on multimodal data.