Ana(stasiia) Iurshina
About
Hi, I'm Ana, a backend developer and machine learning engineer. I have over 7 years of experience software development in the industry and around 3.5 years of research experience, primarily in applying deep learning methods to natural language processing tasks such as entity linking, named entity recognition, automatic text summarization, and more. Most recently, I spent a year as a machine learning engineer in the industry.This is pretty much a clickable version of my CV, plus a tiny addition of some personal info (Books and Pictures sections).
Apart from pure coding, debugging (I 💛 debugging!) and training models, I'm interested in the following topics:
- Open source
- Human rights and equal opportunities for all
- Cybersecurity, OSINT
- Art (involving computers or not)
Work Experience
| 09.2023 now | Senior Machine Learning Engineer at GFT Tasks: made significant contribution into development and design of a chat application which is successfully used in production by more than 3000 users; developed a testing framework for a RAG pipeline; mentored junior developers; Environment: Python, LLMs, Azure, FastAPI, Git | 
| 07.2023 09.2023 | Senior Machine Learning Engineer at Insaas Tasks: prototyping a question answering system with a vector database, LLM, FastAPI and streamlit; using an LLM to solve a business task: aspect-based sentiment analysis of product reviews Environment: Python, LLMs, GCP, Git | 
| 07.2020 06.2023 | Researcher at University of Stuttgart Tasks: research in the area of neural entity linking; deploying and supporting a website with a demo (nginx, docker compose, SSL); teaching assistant for "Intro to AI" course; student theses supervision (e.g. "Extracting and Segmenting High-Variance References from PDF Documents with BERT") Environment: Python, Pytorch, Git, Docker | 
| 01.2020 03.2020 | Research Intern at Bosch Center for AI Tasks: research project on the topic of task-specific named entity recognition in a multi-lingual setting Environment: Python, Pytorch, Git | 
| 09.2019 12.2019 | Research Science Intern at Amazon Tasks: research project in the field of automatic summarization Environment: Python, Tensorflow, Pandas, Git, AWS | 
| 04.2018 09.2019 | Software Developer for NLP at Sony, Speech and Sound group Tasks: investigation and implementation of different tasks in the area of text processing (tokenization, sentence segmentation, POS tagging, etc) Environment: Java, Git, Python, C++ | 
| 04.2018 09.2019 | Research assistant at Institut für Linguistik/Anglistik Tasks: data prepossessing, extraction of sentences of a certain structure based on dependency parsing results Environment: Python (nltk, pandas, spaCy etc), MaltParser, Java | 
| 07.2014 09.2017 | Senior Software Engineer, Team Lead at EPAM Systems Tasks: implementing different design and coding tasks, leading a team of 3 developers Environment: Java, Python, Git, Spring, Jenkins, CXF, REST, Maven, Tomcat, Talend, Virtuoso, PostgreSQL, MyBatis, Apache Ignite, Hadoop, Spark | 
| 07.2012 07.2014 | Senior Software Engineer at GGA Software Services Project: a system for interacting with liquid handling robot devices and. Tasks: implemented robot manipulation logic, applied searching for connectivity components in a graph to a business task Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Mule ESB, Apache Solr | 
| 07.2010 07.2012 | Software Engineer at GGA Software Services Tasks: developing software for pharmaceutical companies. Environment: Java, SVN, ORACLE, Spring, SOAP, Maven, Tomcat, Quartz for job scheduling | 
| 01.2010 07.2010 | Junior Software Developer at GGA Software Services Tasks:implementing web services, working with a database. Environment: Java, SVN, Swing, MySQL, Hibernate | 
Skills
Using (almost) every day:
| Python | ***** | 
| Pytorch | ***** | 
| Natural Language Processing | ***** | 
| Machine Learning | ***** | 
| Deep Learning | ***** | 
| Transformers package | ***** | 
| Git | ***** | 
| Docker | ***** | 
| Python ML/NLP stack (numpy, spacy, scikit-learn, etc) | ***** | 
Significant experience in the past or using from time to time:
Java, SQL, Spring, AWSFamiliar with (used in different projects throughout my career and studies):
Keras, tensorflow, LaTex, bash scripting, HTML, CSS, no-SQL databases, Scala, Hadoop, Spark, elasticsearch, Solr, Lucene, Kafka, C++Languages
| English | Fluent (C1/C2) | 
| Russian | Native | 
| German | Limited (B1+) | 
| Arabic | Basic (A1/A2) | 
Recent projects
The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a
                corresponding entity in the associated knowledge base. We created a dataset for this task. 
                To achieve this, we compared two snapshots of WikiData (from different timestamps), found entities that
                are in the later snapshot but absent from the older one. For this entities we extracted corresponding
                text from Wikipedia.
                
                More information can be found in the paper.
                
                The dataset is available at zenodo
            
                A prototype of a pipeline for processing of .tmx files (xml-like files used for translation annotation).
                Relies on the producer-consumer architecture (implemented using Kafka).
                
                Stack: Python, Kafka, Docker
                
                Code on github
            
In this work, we explored multilingual methods for the extraction of temporal expressions from text and investigated adversarial training for aligning embedding spaces to one common space. The work resulted in the publication
During my internship at Amazon (2020) I worked on automatic summarization of user reviews. 
                Stack: Python, Tensorflow, USE (universal sentence encoder), BERT
            
My master thesis (2020) addressed the problem of fiction summarization using transformers. 
 I tried
                different training and pre-training settings, and different architectures (vanilla transformer, GPT-2),
                looked into the attention distribution and analyzed errors. The best solution (though still very prone
                to hallucinations) consisted of a pre-processing step with extraction of "main" sentences followed by
                summarization by GPT-2.
                
                The pre-processing step relied on k-means clustering of sentences encoded with universal sentence
                encoder. I took 2-3 sentences closest to the centroid of a cluster for each paragraph. 
                Stack: Python, Pytorch, USE (universal sentence encoder), scikit-learn
            
Education
University education
| 07.2020 Now | PhD Student in Computer Science at University of Stuttgart | 
| 03.2020 | Master Degree in Computational Linguistics at University of Stuttgart Selected coursework: Machine Learning, Reinforcement Learning, Deep Learning for NLP, Advanced Computational Semantics, Advanced Semantics, Lexical Semantics | 
| 07.2010 | Undergraduate Degree in Computer Science at St. Petersburg State Polytechnic | 
| 07.2008 | Undergraduate Degree in Innovation Management at St. Petersburg State Polytechnic University | 
Summer schools
| 06.2021 | Nordic Probabilistic AI School (ProbAI) Topics: Probabilistic programming, variational inference, GANs | 
| 04.2021 | Oxford ML School (OxML) Topics: Machine Learning for healthcare | 
Volunteering
| 08.2024-now | Imagine Foundation e.V. Tasks: Providing career coaching for tech candidates from the MENA region | 
| 05.2023-05.2024 | FrauenLoop Tasks: Mentoring of web development for women who want to change their career | 
Publications
"NILK: Entity Linking Dataset Targeting NIL-linking Cases", Anastasiia Iurshina, Jiaxin Pan, Rafika Boutalbi, Steffen Staab, CIKM 2022"Tensor-based Graph Modularity for Text Data Clustering", Rafika Boutalbi, Mira Ait-Saada, Anastasiia Iurshina, Steffen Staab, Mohamed Nadif, SIGIR 2022
"Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text, Lukas Lange, Anastasiia Iurshina, Heike Adel, Jannik Strötgen, RepL4NLP at ACL 2020
Books
A list of books I read recently (or not so recently but like to bring up on every possible occasion).Tech
- "Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications", Chip Huyen
- "Machine Learning Design Patterns", V. Lakshmanan, S. Robinson, M. Munn
- "The Hundred-Page Machine Learning Book", Andriy Burkov
- "Pro Git", Scott Chacon and Ben Straub
- "Rust", Steve Klabnik and Carol Nichols
- "Introduction to Information Retrieval", Christopher D. Manning
Culture&Theory
- "Cultural Studies. Theory and Practice", Chris Barker
- "American Originality: Essays on Poetry", Louise Glück
- "The Queer Art of Failure", J. Jack Halberstam
- "Metamodernism: Historicity, Affect, and Depth After Postmodernism", Robin van den Akker
- "Technofeminism", Judy Wajcman
Fiction
- "Cold Enough for Snow", Jessica Au
- "Light in August", William Faulkner
- "Infinite Jest", David Foster Wallace
- "White Noise", Don DeLillo
- "The Sailor Who Fell from Grace with the Sea", Yukio Mishima
Poetry
- "Winter Recipes from the Collective", Louise Glück
- "All My Pretty Ones", Anne Sexton
- "Dream Work", Mary Oliver
- "Dancing in Odessa", Ilya Kaminsky