News and events archive

From the faculty

  • Title image

    Unlocking Life’s Code: Prajna Hebbar on the Future of Genomics

    How do we make sense of the vast information hidden in genome? And what happens when we finally complete them, end to end? Prajna Hebbar from the University of California Santa Cruz works at the frontier where computer science meets biology, developing tools to annotate and compare genomes across species. Visiting the Faculty of Informatics MU, she shared her recent advances in the Comparative Annotation Toolkit 2.0 and discussed how complete genomes are transforming our understanding of evolution and human disease.


    Could you walk us through your research journey? What brought you to the field of comparative and pan-genomics, which is based on examination and comparison of whole genome sequences and structures between different species? What excites you most about working in this area?

    I started out as a computer science undergrad interested in the intersection of CS and biology. That curiosity led me to explore several diverse research experiences across computational biology over two years, which ultimately motivated me to pursue a PhD in the field. While exploring prospective universities and directions for my PhD, I came across research in comparative genomics and pangenomics being carried out in my now-lab, and I never looked back. I really enjoy working at the cutting edge of genomic research which enables me to access the latest high-quality datasets and develop tools to analyze them.

    The genome is essentially a collection of genetic information about a specific organism hidden in its DNA. In one of your talks at FI MU, you focused on the Comparative Annotation Toolkit 2.0 (CAT2.0), a software for the automated, high-quality annotation of multiple closely related genomes. For readers who may not be specialists: why is accurate gene annotation such a critical step in genomics?

    I like to think of gene annotation as providing meaning to the genome, it tells us what is where and what it does. Most downstream users of a genome depend on these annotations to make biological inferences. If the annotation is wrong or incomplete, every analysis that builds on it could be compromised. That is why accurate gene annotations are essential for reliable science.

    CAT2.0 aims to simplify a process that often requires multiple tools and extensive manual work. How does it achieve this?

    CAT2.0 integrates multiple sources of evidence for gene annotation: reference-free alignments (finding regions of similarity across genomes without relying on one as a template), pairwise alignments, protein alignments, and evidence of gene expression in tissues. Each of these approaches has its strengths and weaknesses. By combining them, CAT2.0 increases accuracy while reducing the need for manual curation. This efficiency is particularly important in large-scale projects like the Human Pangenome or T2T Primates, mapping entire populations of species, where dozens or hundreds of genomes must be annotated consistently and quickly.

    You’ve been directly involved in projects producing complete genomes, such as the telomere-to-telomere (T2T) assembly of the common marmoset, i.e. from the first end of the chromosome to the last. Why is having a fully complete genome so important for evolutionary and biomedical research?

    Complete genomes fill in the missing pieces — especially in complex and repetitive regions like segmental duplications, centromeres, and satellite DNA. These regions often play key roles, like in gene regulation and evolution, for example, but were historically understudied because they were so hard to assemble. With complete genomes, we gain more accurate mapping statistics, better insight into structural variation, and better & more precise reference for biomedical research studies.

    The marmoset was your focus in the second talk. What makes this New World monkey such a valuable model organism, and what new insights can we expect from its complete genome?

    The common marmoset is a valuable research model organism due to its small size, high reproductive rate, and biological similarity to humans. Having a telomere-to-telomere genome opens up new opportunities: we can better study regions linked to neurological traits, identify structural variants, and refine its use as a model for human disease.

    Your research often spans international collaborations. How do projects like the Human Pangenome manage cooperation across so many labs and scientists?

    These large international projects succeed due to commitment to open science and solid communications. Teams share their data and workflows openly, which allows researchers from across the world to contribute. These projects also conduct workshops and conferences regularly to bring in a wider audience into the fold. 

    You have visited FI MU on the invitation of Monika Čechová. Could you tell us more about your collaboration — how did it begin, and what directions are you exploring together?

    Monika and I met when I started my PhD while she was a postdoc in Karen Miga’s group at UC Santa Cruz. She has been a mentor and a collaborator since then. We share an interest in pangenomes and in genome assemblies, especially in the biology of “complex” genomic regions like the acrocentric chromosomes or the Y chromosome. Our collaboration continues to explore new methods and datasets to better understand these challenging regions.


    Photo: Prajna Hebbar and Monika Čechová

    Many students at FI MU are just entering the field of bioinformatics. What advice would you give them if they want to contribute to cutting-edge projects applied to genome assemblies or annotation pipelines?

    I think the most important thing is to start by building a strong foundation in both biology and computational methods, they are both equally important. It’s important to get very comfortable with coding, and working in high-performance computing environments, because they are essential to bioinformaticians. At the same time, try to explore and understand the biological questions driving the research. Finally, don’t be afraid to talk to researchers - it’s one of the best ways to learn and network.

    Looking ahead, what do you see as the next big challenge or opportunity in comparative genomics and annotation — something today’s students might be working on during their careers?

    One of the biggest challenges will be scaling our current comparative genomics methods to thousands and tens of thousands of genomes. We are moving towards an era in genomics where we will not just have a single “reference” per species, but comprehensive pangenome references that capture population-level diversity. We need to develop tools that can handle this complexity and are accessible to the broader research community. I think this is going to be an important challenge for future researchers. 

    Prajna Hebbar is a PhD student in the Department of Biomolecular Engineering at the University of California Santa Cruz under the supervision of Dr Benedict Paten. She is extremely interested in developing methods for comparative and pan-genomics, with a focus on gene annotation. Her most recent efforts have been towards annotating the human pangenome and building a complete human transcriptome, a kind of map of all genes and their activity in the human body. She is also working to assemble and annotate the complete telomere-to-telomere (T2T) genome of the common marmoset. She has previously played a key role in the T2T Primates project. 

    She visited the Faculty of Informatics, Masaryk University, on 22 and 23 September 2025 to give lectures entitled “Comparative Annotation Toolkit 2.0: Generating high-quality gene annotations on complete primate genomes and human pangenomes to study gene evolution” and “Complete Genome Assemblies for the Common Marmoset”.

    Author: Marta Vrlová, Office for External Relations and Partnerships at FI MU

    Attachments
    Original bulletin in the Information system.