Ana(stasiia) Iurshina


About

Hi, I'm Ana, a backend developer and machine learning engineer. I have over 7 years of experience software development in the industry and around 3.5 years of research experience, primarily in applying deep learning methods to natural language processing tasks such as entity linking, named entity recognition, automatic text summarization, and more. Most recently, I spent a year as a machine learning engineer in the industry.
This is pretty much a clickable version of my CV, plus a tiny addition of some personal info (Books and Pictures sections).
Apart from pure coding, debugging (I 💛 debugging!) and training models, I'm interested in the following topics: If you happen to have any ideas related to these topics and you need technical help or just someone to collaborate with, feel free to contact me (by email: anastasiia.iurshina @ gmail.com or on linkedin).

Work Experience

09.2023
now
Senior Machine Learning Engineer at GFT
Tasks: made significant contribution into development and design of a chat application which is successfully used in production by more than 3000 users;
developed a testing framework for a RAG pipeline;
mentored junior developers;
Environment: Python, LLMs, Azure, FastAPI, Git
07.2023
09.2023
Senior Machine Learning Engineer at Insaas
Tasks: prototyping a question answering system with a vector database, LLM, FastAPI and streamlit;
using an LLM to solve a business task: aspect-based sentiment analysis of product reviews
Environment: Python, LLMs, GCP, Git
07.2020
06.2023
Researcher at University of Stuttgart
Tasks: research in the area of neural entity linking;
deploying and supporting a website with a demo (nginx, docker compose, SSL);
teaching assistant for "Intro to AI" course;
student theses supervision (e.g. "Extracting and Segmenting High-Variance References from PDF Documents with BERT")
Environment: Python, Pytorch, Git, Docker
01.2020
03.2020
Research Intern at Bosch Center for AI
Tasks: research project on the topic of task-specific named entity recognition in a multi-lingual setting
Environment: Python, Pytorch, Git
09.2019
12.2019
Research Science Intern at Amazon
Tasks: research project in the field of automatic summarization
Environment: Python, Tensorflow, Pandas, Git, AWS
04.2018
09.2019
Software Developer for NLP at Sony, Speech and Sound group
Tasks: investigation and implementation of different tasks in the area of text processing (tokenization, sentence segmentation, POS tagging, etc)
Environment: Java, Git, Python, C++
04.2018
09.2019
Research assistant at Institut für Linguistik/Anglistik
Tasks: data prepossessing, extraction of sentences of a certain structure based on dependency parsing results
Environment: Python (nltk, pandas, spaCy etc), MaltParser, Java
07.2014
09.2017
Senior Software Engineer, Team Lead at EPAM Systems
Tasks: implementing different design and coding tasks, leading a team of 3 developers
Environment: Java, Python, Git, Spring, Jenkins, CXF, REST, Maven, Tomcat, Talend, Virtuoso, PostgreSQL, MyBatis, Apache Ignite, Hadoop, Spark
07.2012
07.2014
Senior Software Engineer at GGA Software Services
Project: a system for interacting with liquid handling robot devices and.
Tasks: implemented robot manipulation logic, applied searching for connectivity components in a graph to a business task
Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Mule ESB, Apache Solr
07.2010
07.2012
Software Engineer at GGA Software Services
Tasks: developing software for pharmaceutical companies.
Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Quartz for job scheduling
01.2010
07.2010
Junior Software Developer at GGA Software Services
Tasks:implementing web services, working with a database.
Environment: Java, SVN, Swing, MySQL, Hibernate

Skills

Using (almost) every day:

Python *****
Pytorch *****
Natural Language Processing *****
Machine Learning *****
Deep Learning *****
Transformers package *****
Git *****
Docker *****
Python ML/NLP stack (numpy, spacy, scikit-learn, etc) *****

Significant experience in the past or using from time to time:

Java, SQL, Spring, AWS

Familiar with (used in different projects throughout my career and studies):

Keras, tensorflow, LaTex, bash scripting, HTML, CSS, no-SQL databases, Scala, Hadoop, Spark, elasticsearch, Solr, Lucene, Kafka, C++

Languages

English Fluent (C1/C2)
Russian Native
German Limited (B1+)
Arabic Basic (A1/A2)

Recent projects

The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. We created a dataset for this task.
To achieve this, we compared two snapshots of WikiData (from different timestamps), found entities that are in the later snapshot but absent from the older one. For this entities we extracted corresponding text from Wikipedia.
More information can be found in the paper.
The dataset is available at zenodo

A prototype of a pipeline for processing of .tmx files (xml-like files used for translation annotation). Relies on the producer-consumer architecture (implemented using Kafka).
Stack: Python, Kafka, Docker
Code on github

In this work, we explored multilingual methods for the extraction of temporal expressions from text and investigated adversarial training for aligning embedding spaces to one common space.
The work resulted in the publication

During my internship at Amazon (2020) I worked on automatic summarization of user reviews.
Stack: Python, Tensorflow, USE (universal sentence encoder), BERT

My master thesis (2020) addressed the problem of fiction summarization using transformers.
I tried different training and pre-training settings, and different architectures (vanilla transformer, GPT-2), looked into the attention distribution and analyzed errors. The best solution (though still very prone to hallucinations) consisted of a pre-processing step with extraction of "main" sentences followed by summarization by GPT-2.
The pre-processing step relied on k-means clustering of sentences encoded with universal sentence encoder. I took 2-3 sentences closest to the centroid of a cluster for each paragraph.
Stack: Python, Pytorch, USE (universal sentence encoder), scikit-learn

Education

University education

07.2020
Now
PhD Student in Computer Science at University of Stuttgart
03.2020 Master Degree in Computational Linguistics at University of Stuttgart
Selected coursework: Machine Learning, Reinforcement Learning, Deep Learning for NLP, Advanced Computational Semantics, Advanced Semantics, Lexical Semantics
07.2010 Undergraduate Degree in Computer Science at St. Petersburg State Polytechnic
07.2008 Undergraduate Degree in Innovation Management at St. Petersburg State Polytechnic University

Summer schools

06.2021 Nordic Probabilistic AI School (ProbAI)
Topics: Probabilistic programming, variational inference, GANs
04.2021 Oxford ML School (OxML)
Topics: Machine Learning for healthcare

Volunteering

08.2024-now Imagine Foundation e.V.
Tasks: Providing career coaching for tech candidates from the MENA region
05.2023-05.2024 FrauenLoop
Tasks: Mentoring of web development for women who want to change their career

Publications

"NILK: Entity Linking Dataset Targeting NIL-linking Cases", Anastasiia Iurshina, Jiaxin Pan, Rafika Boutalbi, Steffen Staab, CIKM 2022
"Tensor-based Graph Modularity for Text Data Clustering", Rafika Boutalbi, Mira Ait-Saada, Anastasiia Iurshina, Steffen Staab, Mohamed Nadif, SIGIR 2022
"Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text, Lukas Lange, Anastasiia Iurshina, Heike Adel, Jannik Strötgen, RepL4NLP at ACL 2020

Books

A list of books I read recently (or not so recently but like to bring up on every possible occasion).

Tech

Culture&Theory

Fiction

Poetry

Pictures

Just to give you an idea what I'm doing when I'm not debugging or reading.