Connecting the worlds of biologists and computer scientists
Suppose someone has been working at FI MU for the last five years. Suddenly, he is the only computer scientist in the biological lab. In that case, they must gain a new set of experience - even within soft skills. This process takes several years, but once you have this experience, any further cooperation is much, much easier.
Monika Čechová studied bioinformatics at the Faculty of Informatics of Masaryk University (FI MU). She continued her doctoral studies at the American University of Penn State. Monika is engaged in biology and bioinformatics research, focusing on sex chromosomes, satellite biology, non - B DNA, and reproductive biology. She aptly describes her bioinformatics experience summarized in ten successful collaborations In her article “Ten simple rules for biologists initiating a collaboration with computer scientists” in the journal PLOS Computational Biology. It was about this article and the research she told us in an interview.
What led you to study bioinformatics?
When I was in high school, I knew I wanted to study computer science because I saw their vast potential. In the penultimate year, we discussed genetics, which absolutely fascinated me. Then I saw an article by Associate Professor Fatima Cvrčková entitled "Bioinformatics - halfway between algorithms and life", this was the first time I realized that these two worlds - life represented by biology and informatics - can be connected. Masaryk University allowed great freedom and flexibility in subjects' choice, so I could mix my studies as I saw fit.
Do you think that bioinformatics study covers an extensive range of both computer science and biology?
I only studied computer science in my bachelor's degree, and I chose bioinformatics except for the master's degree. I enrolled in subjects from both fields throughout the study. At the Faculty of Science, I got into genetics, research with drosophila and pipetting. Furthermore, I completed several computer science courses at FI.
So you signed up for every course you were interested in and enjoyed?
Did the study of bioinformatics prepare you for research work?
It probably depends on what stage and in which area of research. During my scientific practice, I very much appreciated that there were no reliefs in bioinformatics studies - we completed all subjects that would be completed by computer science. If I could turn back time and advise myself on studying, I would enrol in more theoretical and mathematical subjects. This knowledge is more difficult to acquire during practice, specifically, for example, in the complexity of algorithms.
On the other hand, bioinformatics can be dual. Either it can be bioinformatics in the sense of developing new bioinformatics algorithms, protocols - something that moves the field forward or applied bioinformatics, where one actually uses bioinformatics tools to answer biological questions. So bioinformatics is more of a tool. In terms of practical experience, personal experience is irreplaceable. Because I studied computer science, I felt that I have knowledge or expertise in specific areas that a person from the biological field does not have. This actually brings us to interconnection and interdisciplinarity.
This also applies to your article in PLOS. You describe a decade for quality collaboration between biologists and computer scientists; What motivated you to write it?
Long-term experience from working with bioinformatics. During my master's degree, I worked at the Institute of Biophysics of the Academy of Sciences. I worked in a biological laboratory, where I was the only bioinformatics at the beginning. The team gradually grew, and there were more and more of us. However, in the beginning, I was the only one, which brings great challenges, because biologists think differently. It's not like it's often presented in movies and series that they're born and think differently, but it's training - what a person learns during their studies and what kind of thinking they're led to, what their experience is. Some of the experiences I had were frustrating, but on the other hand, it was also a challenge for biologists - to communicate with me and understand what is important to me.
At Penn State University, it was common for informatics, statistician, and biologist to sit in one room. From all these experiences, I thought about why this interaction is interesting and useful, and what are the barriers to quality cooperation and what we need to learn from each other for better communication. This article is something I wrote in my head for several years. Then I finally wrote it down.
Did it take you a long time to sync in the team? Was it difficult to overcome barriers?
I think it's not just about getting used to new people, but rather learning it for the first time. Suppose someone has been working at FI MU for the last five years. Suddenly, he is the only computer scientist in the biological lab. In that case, they must gain a new set of experience - even within soft skills. This process takes several years, but once you have this experience, any further cooperation is much, much easier. For this reason, I would advise all bioinformatics students to take the opportunity as soon as possible to talk to biologists or statisticians, get to the same summer school, the same lecture and go together for coffee, virtual coffee or a conference. The more such common interactions they have together, the better the common science.
After graduating with a master's degree, you went to Penn State in America for a doctorate. Why did you choose America?
At the Institute of Biophysics, I studied plant sex chromosomes with Associate Professor Eduard Kejnovský. During a conference in France, I met my future trainer, who was examining the sex chromosomes of apes, which was a great opportunity for me to continue the topic elsewhere.
What was it like in America? Have you noticed any significant differences compared to the Czech Republic?
It was very different in that bioinformatics at Penn State operated as a separate field. That means that there are laboratories where bioinformatic is a head, and the whole laboratory focuses only on bioinformatics. This is the direction I described earlier, the development of new bioinformatics algorithms and the shifting possibilities of bioinformatics.
Students have many opportunities to enrol in the same subjects, the same workshops, so one of the most amazing things was that interdisciplinarity was really lived there in everyday life.
At the same time, there is a vast connection of individual laboratories and departments, as well as a large number of lectures across the university. Students have multiple opportunities to enter the same subjects, the same workshops. One of the most amazing things was that interdisciplinarity was really lived there in everyday life. There were even special training programs specifically designed to strengthen interdisciplinarity - biology, computer science and statistics. Then, there were very nice financial options of the more everyday things when it came to designing experiments.
As I said, I enrolled in various subjects at FI MU. Sometimes it was difficult for me to understand the broader context of the field, and I had to believe that the use would be somewhere. While at Penn State, the entire first lesson (really 50 minutes) was devoted to why the subject was so important. When it happened to me for the first time, I felt that it was a waste of time, but I evaluate it very positively in retrospect.
Didn't you want to stay in America anymore?
One of the reasons for return was to see the family again. During my studies, my husband and I raised a son, so one reason was for him to learn Czech and live close to his family. The second reason was that we think that research in both Life Sciences and bioinformatics is on the rise, so if not now, then in the future, there will be opportunities.
So do you think that future bioinformatics will find employment in Brno?
I think for sure. And it is already clear that research is lacking. For example, in the current coronavirus epidemic, it is essential to monitor which variants of coronavirus are found in countries, how the virus changes, how it mutates, and the consequences of mutations. This is something that bioinformatics can and will do.
You looked at the Y chromosome in monkeys. Why is this topic so important? What is the motivation behind him?
The main motivation is that the Y chromosome contains genes that are important for fertility and spermatogenesis. At the same time, its organization is such that it is very difficult to examine. It is not very easy to get complete genetic information, even with humans. We know that the Y chromosome in humans is dynamic, meaning that there are various rearrangements where part of the chromosome is completely lost. The result is infertility.
In humans' closest relatives - apes, we had information about the chimpanzee's Y chromosome. It has been found in the past that the Y chromosome of a chimpanzee is entirely different from the human one. As if many genes that a person has and are essential to him, the chimpanzee did not need at all. That was one of the great mysteries. By completely reassembling and examining the Y chromosome of the Bonobo chimpanzee and the orangutan, we were able to put this puzzle together and answer whether the chimpanzee is unique in this regard and what is the evolution of the chromosome.
Specifically, chimpanzees have large mating promiscuity, so chromosome evolution may be related to evolutionary pressure to make the chimpanzee have a lot of sperm.
What is the procedure for such research?
The biggest challenge was that there were no IT procedures for assembling the Y chromosome. All the approaches that were until then are laborious, costly and would take many years. In comparison, our goal was to assemble these chromosomes using Next-generation sequencing, which includes several technologies. So we wanted to come up with a combination of interconnecting these technologies. For example, we can read very short DNA sequences, but very accurately, or very long sequences, but very inaccurately. So coming up with the perfect combination was really important for us.
In one paper, colleagues analyzed 100 men and found 98 combinations of gene numbers on the Y chromosome. This means that one perfectly healthy man can have, for example, 15 copies of one gene. One of his colleagues can have 20 copies. Both are healthy and fertile, so it is a mystery how it is possible that there is so much variability in these genes.
These readings are currently very long and very erroneous, but it is gradually improving. We have better and better sequences also thanks to advances in Machine Learning and computer science.
Are there any suitable devices for your research in Brno?
Partly yes, but some are in Vienna. Another option is sequencing with Oxford Nanopore, a tiny box that plugs into a computer via USB and can be sequenced immediately. These readings are currently very long and very erroneous, but it is gradually improving. We have better and better sequences also thanks to advances in Machine Learning and computer science. We can better convert the electrical signal to the sequence itself. Quite a few laboratories in Brno use this method, so the challenge is how to prepare the sample so that it is as little damaged as possible, so that the DNA is as preserved and long as possible. Moreover, the analysis and interpretation should be of good quality, which is always a challenge.
And how is this data then represented on the computer?
It is interesting that in the most traditional design, it is a text file. At Penn State, I also worked as a teaching assistant, so I taught American students bioinformatics. At the exercise, we obtained the DNA of various animals, the students did not know what they received, but they had to determine what it was. We prepared the sample for sequencing, which takes several hours. Then we loaded it and turned on sequencing. In 2 minutes, the first sequences were available. So we immediately had a text file, and we could read what was in the sample.
It also depends on how deep the researcher wants to do the analysis. It is always possible to go back a step and look at the electrical signal. The molecule, during sequencing, passes through an artificial pore and changes the electrical signal. There are many algorithms, including neural networks, that try to translate the data into a sequence.