Ana(stasiia) Iurshina

About

Hi, I'm Ana, a backend developer and machine learning engineer. I have over 7 years of experience software development in the industry and around 3.5 years of research experience, primarily in applying deep learning methods to natural language processing tasks such as entity linking, named entity recognition, automatic text summarization, and more. Most recently, I spent a year as a machine learning engineer in the industry.
This is pretty much a clickable version of my CV, plus a tiny addition of some personal info (Books and Pictures sections).
Apart from pure coding, debugging (I 💛 debugging!) and training models, I'm interested in the following topics:

Open source
Human rights and equal opportunities for all
Cybersecurity, OSINT
Art (involving computers or not)

If you happen to have any ideas related to these topics and you need technical help or just someone to collaborate with, feel free to contact me (by email: anastasiia.iurshina @ gmail.com or on linkedin).

Work Experience

09.2023 now	Senior Machine Learning Engineer at GFT Tasks: made significant contribution into development and design of a chat application which is successfully used in production by more than 3000 users; developed a testing framework for a RAG pipeline; mentored junior developers; Environment: Python, LLMs, Azure, FastAPI, Git
07.2023 09.2023	Senior Machine Learning Engineer at Insaas Tasks: prototyping a question answering system with a vector database, LLM, FastAPI and streamlit; using an LLM to solve a business task: aspect-based sentiment analysis of product reviews Environment: Python, LLMs, GCP, Git
07.2020 06.2023	Researcher at University of Stuttgart Tasks: research in the area of neural entity linking; deploying and supporting a website with a demo (nginx, docker compose, SSL); teaching assistant for "Intro to AI" course; student theses supervision (e.g. "Extracting and Segmenting High-Variance References from PDF Documents with BERT") Environment: Python, Pytorch, Git, Docker
01.2020 03.2020	Research Intern at Bosch Center for AI Tasks: research project on the topic of task-specific named entity recognition in a multi-lingual setting Environment: Python, Pytorch, Git
09.2019 12.2019	Research Science Intern at Amazon Tasks: research project in the field of automatic summarization Environment: Python, Tensorflow, Pandas, Git, AWS
04.2018 09.2019	Software Developer for NLP at Sony, Speech and Sound group Tasks: investigation and implementation of different tasks in the area of text processing (tokenization, sentence segmentation, POS tagging, etc) Environment: Java, Git, Python, C++
04.2018 09.2019	Research assistant at Institut für Linguistik/Anglistik Tasks: data prepossessing, extraction of sentences of a certain structure based on dependency parsing results Environment: Python (nltk, pandas, spaCy etc), MaltParser, Java
07.2014 09.2017	Senior Software Engineer, Team Lead at EPAM Systems Tasks: implementing different design and coding tasks, leading a team of 3 developers Environment: Java, Python, Git, Spring, Jenkins, CXF, REST, Maven, Tomcat, Talend, Virtuoso, PostgreSQL, MyBatis, Apache Ignite, Hadoop, Spark
07.2012 07.2014	Senior Software Engineer at GGA Software Services Project: a system for interacting with liquid handling robot devices and. Tasks: implemented robot manipulation logic, applied searching for connectivity components in a graph to a business task Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Mule ESB, Apache Solr
07.2010 07.2012	Software Engineer at GGA Software Services Tasks: developing software for pharmaceutical companies. Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Quartz for job scheduling
01.2010 07.2010	Junior Software Developer at GGA Software Services Tasks:implementing web services, working with a database. Environment: Java, SVN, Swing, MySQL, Hibernate

Skills

Using (almost) every day:

Python	*****
Pytorch	*****
Natural Language Processing	*****
Machine Learning	*****
Deep Learning	*****
Transformers package	*****
Git	*****
Docker	*****
Python ML/NLP stack (numpy, spacy, scikit-learn, etc)	*****

Significant experience in the past or using from time to time:

Java, SQL, Spring, AWS

Familiar with (used in different projects throughout my career and studies):

Keras, tensorflow, LaTex, bash scripting, HTML, CSS, no-SQL databases, Scala, Hadoop, Spark, elasticsearch, Solr, Lucene, Kafka, C++

Languages

English	Fluent (C1/C2)
Russian	Native
German	Limited (B1+)
Arabic	Basic (A1/A2)

Recent projects

The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. We created a dataset for this task.
To achieve this, we compared two snapshots of WikiData (from different timestamps), found entities that are in the later snapshot but absent from the older one. For this entities we extracted corresponding text from Wikipedia.
More information can be found in the paper.
The dataset is available at zenodo

A prototype of a pipeline for processing of .tmx files (xml-like files used for translation annotation). Relies on the producer-consumer architecture (implemented using Kafka).
Stack: Python, Kafka, Docker
Code on github

In this work, we explored multilingual methods for the extraction of temporal expressions from text and investigated adversarial training for aligning embedding spaces to one common space.
The work resulted in the publication

During my internship at Amazon (2020) I worked on automatic summarization of user reviews.
Stack: Python, Tensorflow, USE (universal sentence encoder), BERT

My master thesis (2020) addressed the problem of fiction summarization using transformers.
I tried different training and pre-training settings, and different architectures (vanilla transformer, GPT-2), looked into the attention distribution and analyzed errors. The best solution (though still very prone to hallucinations) consisted of a pre-processing step with extraction of "main" sentences followed by summarization by GPT-2.
The pre-processing step relied on k-means clustering of sentences encoded with universal sentence encoder. I took 2-3 sentences closest to the centroid of a cluster for each paragraph.
Stack: Python, Pytorch, USE (universal sentence encoder), scikit-learn

Education

University education

07.2020 Now	PhD Student in Computer Science at University of Stuttgart
03.2020	Master Degree in Computational Linguistics at University of Stuttgart Selected coursework: Machine Learning, Reinforcement Learning, Deep Learning for NLP, Advanced Computational Semantics, Advanced Semantics, Lexical Semantics
07.2010	Undergraduate Degree in Computer Science at St. Petersburg State Polytechnic
07.2008	Undergraduate Degree in Innovation Management at St. Petersburg State Polytechnic University

Summer schools

06.2021	Nordic Probabilistic AI School (ProbAI) Topics: Probabilistic programming, variational inference, GANs
04.2021	Oxford ML School (OxML) Topics: Machine Learning for healthcare

Volunteering

08.2024-now	Imagine Foundation e.V. Tasks: Providing career coaching for tech candidates from the MENA region
05.2023-05.2024	FrauenLoop Tasks: Mentoring of web development for women who want to change their career

Publications

"NILK: Entity Linking Dataset Targeting NIL-linking Cases", Anastasiia Iurshina, Jiaxin Pan, Rafika Boutalbi, Steffen Staab, CIKM 2022
"Tensor-based Graph Modularity for Text Data Clustering", Rafika Boutalbi, Mira Ait-Saada, Anastasiia Iurshina, Steffen Staab, Mohamed Nadif, SIGIR 2022
"Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text, Lukas Lange, Anastasiia Iurshina, Heike Adel, Jannik Strötgen, RepL4NLP at ACL 2020

Books

A list of books I read recently (or not so recently but like to bring up on every possible occasion).

Tech

"Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications", Chip Huyen
"Machine Learning Design Patterns", V. Lakshmanan, S. Robinson, M. Munn
"The Hundred-Page Machine Learning Book", Andriy Burkov
"Pro Git", Scott Chacon and Ben Straub
"Rust", Steve Klabnik and Carol Nichols
"Introduction to Information Retrieval", Christopher D. Manning

Culture&Theory

"Cultural Studies. Theory and Practice", Chris Barker
"American Originality: Essays on Poetry", Louise Glück
"The Queer Art of Failure", J. Jack Halberstam
"Metamodernism: Historicity, Affect, and Depth After Postmodernism", Robin van den Akker
"Technofeminism", Judy Wajcman

Fiction

"Cold Enough for Snow", Jessica Au
"Light in August", William Faulkner
"Infinite Jest", David Foster Wallace
"White Noise", Don DeLillo
"The Sailor Who Fell from Grace with the Sea", Yukio Mishima

Poetry

"Winter Recipes from the Collective", Louise Glück
"All My Pretty Ones", Anne Sexton
"Dream Work", Mary Oliver
"Dancing in Odessa", Ilya Kaminsky

Pictures

Just to give you an idea what I'm doing when I'm not debugging or reading.