How our AI understands German tax law — part three
In part 2 of this series of articles, we presented the methods of network analysis and text classification. In this article Steffen Kirchhoff (CTO and co-founder of Taxy.io) and me will give you some further insights into the Natural Language Processing techniques our digital tax brain uses.
Named Entity Recognition
In Named Entity Recognition, texts are broken down into smaller units: so-called entities. During this atomization, terms and word groups can be recognized and assigned to different groups, such as “person”, “date”, “place” and “technical term”. When analyzing a judgment, for example, the groups “plaintiff” and “defendant” can also be extracted automatically. It is interesting to recognize connections, that is, to reunite extracted connections. If, for example, a judgment refers to the plaintiff’s daughter as “daughter (D)” at the beginning, to which reference is later made as “she” or “D”, our algorithm is be able to understand this connection.
Application of Named Entity Recognition on a judgement text. Among others, it is identified that the judgment is about child benefit and that it is dealing with education-related issues.
Language Modeling
Those who are interested in NLP will also have heard about the methods of language modeling. This area in particular has recently seen groundbreaking advances in research. Particularly noteworthy is the model architecture of the so-called Transformer which was published by Google and on which the very successful models BERT by Google AI Language and GPT-2 by OpenAI are based.
Behind the methods of language modeling are processes by which words are mapped into high-dimensional vector spaces and similar words lie close together.
This helps algorithms learn semantic relationships based on the assumption that terms that often appear in a similar context often have a similar meaning.
In language modeling, models are often first trained in German language more generally and then, during fine-tuning, specialized for different fields of application. To speak metaphorically, our model which first learned the basics of the German language at school is then sent to university to study tax law where, building on its knowledge of German, our model now learns the linguistic subtleties of tax law.
Modeling German Tax Language
At Taxy.io we develop and train language models for the German language with a special focus on the “dialect” of German tax law.
This means that our model learns, for example, that the relationship between landlords and rental income is similar to the one between entrepreneurs and revenues. At the same time (when looking at parallel vectors), the model learns concepts meaningful to humans. In the aforementioned example, this is how the model learned the concept of “income”.
A possible use case for the Taxy.io model could be that judgments found for engineers may also be relevant for programmers. The algorithm recognizes that — in the context of German tax law — both professions can be characteristic of freelancers, but that this characteristic depends on certain conditions. If a judgment is made on this question, for example if it is decided in which cases an engineer is to be classified as a freelancer, the reasons for the decision in the individual case might also be used for argumentation in similar cases involving the professional group of programmers.
Conclusion and Outlook
The importance of knowledge management will continue to increase for all tax consultancies and auditing companies. Equally, individual client cases will become more complex, while clients will place ever greater demands on the response times of the consultant.
In order to meet clients’ requirements for high quality and timely delivery, efficient knowledge management is fast becoming crucial in everyday practice. This is where artificial intelligence can be an important lever, for example through continuous analysis of client data and automated consideration of current case law. Partially automated responses to client enquiries and assisted document creation are two solutions already used by tech-savvy law firms.
Although technological disruption is still a considerable way away, the first step in this direction has already been taken through the use of Natural Language Processing tools.
Comments