Cellular processes are regulated by interactive networks of macromolecules within a dynamic and crowded environment. As such, deciphering the underlying mechanisms of these interactions is pivotal for gaining a deeper understanding of how cells function. Cryo-electron tomography (cryo-ET) provides 3D views of the native cellular environment at molecular resolution, and therefore, has the potential to create a molecular atlas of the cell. However, this endeavor will require powerful new computational tools for structural identification.
Unfortunately, cryo-ET data has a low signal-to-noise ratio and is susceptible to reconstruction artifacts that make the identification of macromolecules in cellular tomograms challenging. While the commonly used template matching approach is relatively successful at localizing large complexes such as ribosomes, it is slow and requires substantial computational resources. Furthermore, template matching struggles to accurately identify smaller complexes, which make up much of the molecular diversity of the cell. Therefore, most often only a few classes of particles are analyzed in each tomogram, neglecting numerous other macromolecular species embedded within the cellular environment.
To overcome these obstacles, an international team of scientists from France, Spain and Germany, under the leadership of Charles Kervrann, developed a deep learning-based framework to quickly identify multiple classes of macromolecules in cryo-ET volumes. This DeepFinder program, now published in Nature Methods, builds upon convolutional neural networks that have already proven highly valuable in the microscopy field. The program consists of a training stage and an analysis stage: The former is a supervised approach requiring expert-user inputs, whereas the latter is nearly unsupervised and very fast, a significant improvement over template matching. In a close collaboration between the HPC group of Ben Engel and the Helmholtz_AI group of Tingying Peng, computational specialists Ricardo Righetto and Lorenz Lamm played pivotal roles in assessing the performance of the new algorithm for small and membrane-bound protein complexes.
Through an extensive validation process, the team clearly demonstrates that DeepFinder can efficiently identify molecular complexes with variable shapes and molecular weights within the crowded cellular environment. Importantly, DeepFinder is significantly faster (~10x) than the conventional template matching procedure and performs better at identifying various macromolecules than other competitive deep learning methods. Thus, DeepFinder not only processes large datasets in a single day, but also allows for the simultaneous identification of several macromolecular species. The algorithm’s accuracy is comparable to expert-supervised ground truth annotations, but without the need for further complex and time-consuming classification steps. Last, but not least, DeepFinder has been implemented as a free, open-source program with an accessible graphical user interface.
DeepFinder is a promising new AI-powered algorithm for the semi-automated analysis of a wide range of molecular targets in cellular tomograms, which will significantly contribute to developing visual proteomics in the near future. It also serves as a prime example illustrating the importance of developing efficient, customized AI tools to accelerate knowledge generation in the biomedical life sciences— an ambition sparked at the interface of the Helmholtz Pioneer Campus, the Institute for Computational Biology, and the national Helmholtz_AI initiative.