News and events archive

From the faculty

  • Title image

    AI fixes Opravidlo

    The successful language corrector Opravidlo from the Faculty of Arts of MU will be improved with the use of artificial intelligence. This will happen thanks to the collaboration with a scientific team from the Faculty of Informatics of MU and the support from the European project Open Call for Open Science Projects and Services (OSCARS). What can Opravidlo do today and what can we look forward to in the future? The answers were provided by Assoc. Prof. RNDr. Aleš Horák, Ph.D., and RNDr. Zuzana Nevěřilová, Ph.D., from FI MU and Mgr. Hana Žižková, Ph.D., from FF MU, representing the project team.

    Please give us the details of the project and its implementation within OSCARS.

    The language corrector Opravidlo Beta can correct some mistakes in Czech - grammatical (for example punctuation and some types of syntax), spelling (for example capitalization), but also formatting errors (spaces, brackets, currencies, etc.). The aim of the project is to significantly improve this application. We participated in the first call of OSCARS. OSCARS projects are committed to Open Science, and our project aims to provide scientific data to a wider community, in addition to the app, which is already free for many users. Language proofreaders are quite in demand for other languages as well. Especially developers for other Slavic languages will find our data helpful to speed up their own development. Analyzing where people make mistakes in text is interesting for both linguists and language teachers.

    In the Opravidlo 2.0 project, we will create a new tool based on deep neural networks and large language models, and on the data and rules created in the current version of Opravidlo.

    Who are you working with on the project?

    We submitted the project together: FI MU and FF MU. The principal investigator is doc. Horák from FI. Some of the team members are the same as in the previous project, Opravidlo Beta, which was created thanks to a grant from the Technology Agency of the Czech Republic.

    The current project was mainly developed at the Faculty of Arts of MU. Opravidlo 2.0 will be based on artificial intelligence techniques and the results of the current Opravidlo, so it involves inter-faculty cooperation between teams from FI and FF.

    Please describe how the Opravidlo 2.0 will be better compared to the previous version.

    Opravidlo Beta is based on hand-crafted rules and has high accuracy. This means that if it flags a piece of text as an error, it most likely is an error. On the other hand, the proofreader has lower coverage, which means it will "miss" some errors. The goal of the project is to increase error coverage, but without "false positives", where the application would indicate an error but the text would be fine. In addition to manually created rules, we want to use neural networks to pick up different nuances of the text and identify the commonness or uncommonness of a sentence construction.

    Another important component of the application is explainability, which goes hand in hand with the use of neural networks. With AI applications, there is often a risk that people do not understand why the system made a decision. This is a significant downside of AI applications and reduces their real-world applicability. Currently, Opravidlo Beta provides explanations for many of the errors it finds - these explanations are part of the rules (e.g. the rule finds an error in the matching of the subject and object and "knows" that this is the problem). It will be hard to find relevant explanations for the errors identified by the neural network, but without one, users would accept the application with less confidence.

    Research team from FF and FI MU (photo: Ondřej Vedral/FF MU)

    How are your roles divided in the team?

    The corrector not only suggests a correction, but also explains why the highlighted text is an error. The explanations must be both correct in context (so that, for example, the application does not explain a missing comma in a sentence where there is an enumeration) and understandable. The FF team is responsible for this part. The FI team will work on neural networks that will evaluate how common a certain part of the text or sentence construction is and what the probability of error is. We will therefore be responsible for the data, its storage, metadata measures and publishing in large research infrastructures. We will also integrate the two approaches so that rules and probabilistic outputs fit together.

    Are FI students involved in the project?

    Yes, we involve undergraduate, graduate and postgraduate students as developers and experimenters.

    Who can access Opravidlo and where can the public find it?

    Opravidlo is freely available through the website at www.opravidlo.cz. You can either type or copy the text into the available text box. The tool will then underline the places in the text that are incorrect, suggest corrections and offer the user a link from the Czech Internet Language Guide with an explanation.

    How long will the project run and what are the next steps?

    The project is for two years and started now in October 2024. The first step before the actual launch of the project was a promotional video. This was followed by setting up the team, assigning roles and launching the first individual work packages according to the planned schedule.

    When will an updated version of Opravidlo be available?

    We anticipate that it could be within two years.

    Are there any comparable tools? Why should we choose Opravidlo?

    Tools such as ChatGPT can also correct Czech text, and do so quite reliably. Their disadvantage is that they do not highlight the error in the text, sometimes loosely rephrase the input text, and cannot provide a linguistic explanation of why the phenomenon is correct or incorrect. Opravidlo has the advantage of explanation and of showing exactly where the error is in the text.

    The benefit of Opravidlo is also its openness and transparency. The user will be able to trust the application that it is not collecting their data if they do not consent to it, that the outcome of the correction will always be the same and predictable, and that they will receive a correct explanation with the proposed correction. You can ask ChatGPT for an explanation. For languages that it is not targeted at, however, you will often get the wrong answer.

    Anything else you want to mention?

    For us, it is interesting that OSCARS is targeting the use of large research infrastructures, the usefulness of which is, in my opinion, not very clear to the public. But for us, it is quite straightforward - we have been part of the CLARIN infrastructure (language data and software tools) for a long time. Thanks to the CLARIN infrastructure (CLARIAH, LINDAT/CLARIAH in Czech) we can publish data and tools and someone else can use them, and of course the same in reverse. The infrastructure offers a secure, long-term sustainable environment for scientific results. Other such infrastructures are e.g. IT4Innovations (PRACE), which offer capacity for scientific computing. More can be found under the keyword ESFRI, for example here.

    Thank you for the interview, we will continue to follow the development of the new tool.

    Author: Marta Vrlová, Office for External Relations and Partnerships at FI MU


    Attachments