Past work

Deep search on Canadian NI-43101 documentation

Mining AI/ML Python RL Canadian NI-43101's are documents legally required reporting standards for mining and mineral exploration companies registered in Canada. They contain a lot of information about the state of exploration and mining work carried out in various projects across the world. However searching these documents effectively is a huge challenge. Deep search for NI-43101's used reinforcement learning and large language models to pick out and extract the best available data across millions of pages of documents to help researchers answer any question they had on historical data or to find connections accross disparate data sources and mineral exploration projects.

Geospatial tenements tooling

GeoJSON React Data pipelines A recurring problem in the mineral exploration industry is the difficulty in keeping up to date with the changes in the ownership rights of mineral exploration stakes. It is highly beneficial to know when a stake is set to expire or to know the currently available properties. Knowing this information can drastically improve the efficiency of exploration programs by allowing researchers to quickly claim projects they are interested in. This tool was built to let users search for tenements across major mining districts such as Australia and Canada. It allowed them to set areas or properties of interest, search through the activity of competitor staking and compare open tenements with pre-existing public data published under NI-43101 and JORC documents.

Seq-2-Seq Recurrant neural net, NER Model for key word classfication

Python Pytorch Deep learning Recurrant neural nets Seq-2-Seq Synthetic data Named entity recognition Large language models Fine tuning

This is a recurrent neural network (RNN) model built with PyTorch. It is a sequence-to-sequence (seq2seq) model trained on tokenized text, predicting the classification of words in a classification vocabulary. The model was built and trained from scratch, utilizing a novel method of constructing a synthetic training set using fine-tuned language models (LLMs). This approach bootstrapped the model to achieve high performance. It is ideal for low-latency text classification on specific vocabularies.

Polya - AI mathematics tutor

Python Fast API OpenAI SymPy React

A mathematics tutoring application that includes a FastAPI backend and SymPy for generating symbolically checked mathematics problems with correct solutions. The application generates questions and uses a fine-tuned language model (LLM) to guide users through the problem-solving process, following Polya's "How to Solve It" mathematics problem-solving framework. Mathematics problems are rendered in LaTeX using KaTeX. The application implements a spaced repetition algorithm for questions and allows users to categorize questions as easy, difficult, or hard. Work has begun on implementing a seq2seq model for solving mathematics problems that are beyond the scope of symbolic checkers.

Micro github co-pilot for moonscript with LORA/PEFT

Python Deep learning Language models Hugging face LORA PEFT Eleuther Pythia Big Code - The Stack

The goal is to achieve similar functionality as GitHub Co-Pilot, but specifically for a small language called Moonscript. The project utilizes open-source models and datasets. It trains an EleutherAI model using low-rank adaptation and parameter-efficient fine-tuning. This model is then quantized to run as an 8-bit model, suitable for smaller systems.

Mini Redis in C++

C++ Threads Non Blocking IO Syscalls Hash tables Serialisation AVL Trees Heaps

An implementation of an in-memory key-value store. It covers some aspects such as interacting with the kernel through syscalls, setting up a server capable of handling larger numbers of connections, and addressing the memory management needs of the hash table.

LLM assisted streamed audio summarizer

Python LLM's AWS Sockets React

I built this tool before the release of Whisper. It utilizes AWS Transcribe to stream audio from a microphone and transcribe it. The transcribed text is then passed to a fine-tuned language model (LLM) to generate a summarized version of the audio. This tool is helpful for obtaining conversation summaries while the conversations are ongoing. Everything is tied together in a React UI.

Akatosh API

Typescipt Type ORM Mocha Chai Stryker Winston AWS Cloud watch Docker Elastic beanstalk

It involves simple CRUD operations in TypeScript, connecting with a database, and handling data processing. It is used to deliver the main dataset utilized in Akatosh.

Automated machine learning model for the detection of outlier car prices

Python Pandas Machine learning Scraping Data processing Scaling Skew resolution CRON Beautiful soup

It involves a machine learning model for detecting cars listed on autotrader.ie/autotrader.co.uk with lower-than-expected pricing. The model is trained on public car sales datasets. A cron job is set up for automated scraping of car data. It sends email notifications when a car matching specific criteria is found.

LLM assisted PDF to Anki Flashcard tool

Python Streamlit React Scraping OpenAI LLM's Fine tuning

I built for personal use, for studying etc. It features a simple UI for uploading PDFs. It uses a fine-tuned language model (LLM) from OpenAI to generate flashcards for selected sections in the PDF. Users can choose the best flashcards from the generated list, which are then automatically added to an Anki deck for spaced repetition learning. There is a possibility to add more features, such as vector databases for image recognition, and implementing reinforcement learning with human feedback (RLHF) based on the deck's card performance.

Simple HTTP proxy server

Python Fast API Redis http cachingmonitoring

This is a simple proxy server that routes http requests. The proxy implements a cache with redis for pages a user has already visited. For https connections it does a pass through. Also does some things like monitoring latency and requests going through the proxy.