Oleksandr Parshakov

Legacy Projects

CIFAR-10 Image Classification with Deep Learning

(2021)

Summary: This project explores deep learning for a computer vision task on the CIFAR-10 dataset, a standard benchmark for 10-class image classification. The project proceeds in two phases. The initial phase involves experimenting with a basic Convolutional Neural Network (CNN) containing just two convolutional layers using ReLU and Sigmoid activations, followed by experiments with Multi-Layer Perceptrons (MLPs), to demonstrate the importance of convolutional layers for image data and the relative performance of different activation functions. This initial CNN achieved only 63% accuracy. The second phase focuses on developing a more complex CNN architecture, drawing inspiration from VGG-style networks in terms of depth and filter sizes, with the goal of exceeding 80% accuracy on a personal computer equipped with a single GPU. By incorporating techniques like batch normalisation and multi-scale convolutional filters, the final model achieved 88% accuracy on the CIFAR-10 test set, surpassing the initial goal and demonstrating the effectiveness of the chosen approach within the given resource constraints.

GitHub repo: github.com/lzrdGreen/Models-for-CIFAR-10

Relevant skills: Python, PyTorch, Scikit-Learn, matplotlib, numpy, pandas

Loss for training and validaion sets

Loss for training and validaion sets.

Jump to the Top

Application of BERT, a Transformer-based language model, to check the correctness of a sentence in English

(September 2021)

Summary: This project tackled the challenge of grammatical error detection in English using Natural Language Processing (NLP) by fine-tuning a pre-trained BERT model with the CoLA dataset. The implementation was completed on a personal computer with a single GTX 1070 GPU, demonstrating the accessibility of advanced NLP techniques without requiring high-performance computing clusters. By validating the model's performance on real-world examples, the project showcased BERT's potential for nuanced linguistic tasks, contributing to understanding fine-tuning techniques and paving the way for practical applications like grammar checkers and language learning tools.

GitHub repo: English Grammar Tester

Relevant skills: Python, PyTorch, Scikit-Learn, matplotlib, numpy, pandas

Jump to the Top

LLM Fine-Tuning

In 2024-25, compact LLMs emerged as a promising area of research.

My 2021 project, "Application of BERT to check the correctness of a sentence," has found an unexpected continuation. While BERT—a masked language model specialised in question answering and sentence classification—represented one approach, generative LLMs now offer an alternative path.

I'm exploring these models through hands-on training on my personal laptop, which limits me to small-scale/compact LLMs.

GRPO Fine-Tuning of Gemma 3-1B-it

(March 2025)

Summary: "Tiny Large Language Models (LLMs) like Qwen2.5-0.5B and TinyLlama-1.1B seem to lack reasoning capabilities. This study explores fine-tuning of Gemma 3-1B-it, the smallest in the recent Google family, using GRPO and a targeted reward system, on the 'causal_judgement' subset of the BBH dataset. This resulted in a promising accuracy improvement, demonstrating the model's enhanced reasoning capabilities.

Fine Tuning of Qwen2.5-0.5B-Instruct Model

(February 2025)

Qwen2.5-0.5B-Instruct is a tiny LLM with half a billion parameters. This study examines various fine-tuning approaches (SFT and DPO, both independently and in combination) against each other and the baseline model.

I assessed the fine-tuning techniques (Table 1 below), using perplexity (PPL) as the evaluation metric—this quantifies how well the model predicts the next token, with lower scores indicating better performance. Each model version was evaluated using diverse prompts, compared against the baseline Qwen2.5-0.5B-Instruct model without fine-tuning.

I first investigated two approaches: Supervised Fine-Tuning (SFT) on conversational HuggingFaceH4/ultrachat_200k dataset, and Direct Preference Optimisation (DPO) on argilla/distilabel-intel-orca-dpo-pairs dataset of accepted/rejected response pairs. Contrary to common belief that human expertise benefits LLM performance, DPO showed virtually no improvement over baseline and proved challenging for my RTX 4080 GPU—the session with the best evaluation_loss ended in a runtime error near completion. More stable settings yielded slightly worse evaluation_loss results. In contrast, Supervised Fine-Tuning (SFT) demonstrated significant improvements, producing the best models according to PPL metrics. The same Jupyter notebook contains evaluation runs for both the baseline model and the pure DPO model loaded from its best checkpoint.

Given DPO's discouraging results, I tested applying DPO after initial SFT (see DPOafterSFT notebook). This approach deteriorated PPL, but less severely than pure DPO. I then applied a second round of SFT to create an SFT-DPO-SFT sequence, which yielded significant improvements nearly matching pure SFT. However, determining which metrics best reflect human preferences remains challenging.

Notably, all models performed strongest on creative writing tasks (e.g., "Write a scene from a play..."). Interestingly, while most factual prompts received good PPL scores, the specific prompt "Give me three facts about London" proved challenging for all models—possibly because the abundance of potential facts makes selection difficult.

Table 1.

Perplexity for Fine-Tuned Qwen2.5-0.5B-Instruct Model Using Various Techniques

Prompt Supervised Fine Tuning Direct Preference Optimisation DPOafterSFT SFT-DPO-SFT Baseline
What is AI? 88.66 178.1 104.6 93.62 179.5
Tell me something interesting about Albert Einstein. 60.97 124.8 74.59 61.76 118.6
Tell me something about Large Language Models. 86.54 120.5 94.39 87.35 119.8
What is geometry? Explain it step by step. 50.75 80.79 60.04 54.14 80.46
Explain the concept of entropy in simple terms. 42.76 60.68 44.75 42.43 60.53
Tell me something about Jean Baudrillard. 50.86 82.91 55.48 51.20 81.40
Who was David Hilbert? 91.98 179.7 131.0 99.74 176.0
Give me three facts about London. 108.7 204.4 120.2 109.3 200.9
Tell a short story about enemies who eventually became friends, why did it happen? 86.02 117.0 97.68 86.90 114.8
Write a scene from a play where two men are having a philosophical debate about the nature of consciousness. 24.19 32.03 25.83 24.79 31.64
Imagine you are a time traveler who has just arrived in the remote future. Describe what you observe that is significantly different from today. 27.74 36.18 30.26 28.76 36.16
Tell me something about love. 76.32 138.5 87.30 77.46 136.0

Jump to the Top