Past work
Deep search on Canadian NI-43101 documentation
Mining
AI/ML
Python
RL
Canadian NI-43101's are documents legally required reporting standards for mining and mineral exploration companies registered in Canada. They contain a lot of information about the state of exploration and mining work carried out in various projects across the world. However searching these documents effectively is a huge challenge. Deep search for NI-43101's used reinforcement learning and large language models to pick out and extract the best available data across millions of pages of documents to help researchers answer any question they had on historical data or to find connections accross disparate data sources and mineral exploration projects.
Geospatial tenements tooling
GeoJSON
React
Data pipelines
A recurring problem in the mineral exploration industry is the difficulty in keeping up to date with the changes in the ownership rights of mineral exploration stakes. It is highly beneficial to know when a stake is set to expire or to know the currently available properties. Knowing this information can drastically improve the efficiency of exploration programs by allowing researchers to quickly claim projects they are interested in. This tool was built to let users search for tenements across major mining districts such as Australia and Canada. It allowed them to set areas or properties of interest, search through the activity of competitor staking and compare open tenements with pre-existing public data published under NI-43101 and JORC documents.
Seq-2-Seq Recurrant neural net, NER Model for key word classfication
Python
Pytorch
Deep learning
Recurrant neural nets
Seq-2-Seq
Synthetic data
Named entity recognition
Large language models
Fine tuning
This is a recurrent neural network (RNN) model built with PyTorch. It is a sequence-to-sequence (seq2seq) model trained on tokenized text, predicting the classification of words in a classification vocabulary. The model was built and trained from scratch, utilizing a novel method of constructing a synthetic training set using fine-tuned language models (LLMs). This approach bootstrapped the model to achieve high performance. It is ideal for low-latency text classification on specific vocabularies.
Polya - AI mathematics tutor
Python
Fast API
OpenAI
SymPy
React
A mathematics tutoring application that includes a FastAPI backend and SymPy for generating symbolically checked mathematics problems with correct solutions. The application generates questions and uses a fine-tuned language model (LLM) to guide users through the problem-solving process, following Polya's "How to Solve It" mathematics problem-solving framework. Mathematics problems are rendered in LaTeX using KaTeX. The application implements a spaced repetition algorithm for questions and allows users to categorize questions as easy, difficult, or hard. Work has begun on implementing a seq2seq model for solving mathematics problems that are beyond the scope of symbolic checkers.
Micro github co-pilot for moonscript with LORA/PEFT
Python
Deep learning
Language models
Hugging face
LORA
PEFT
Eleuther Pythia
Big Code - The Stack
The goal is to achieve similar functionality as GitHub Co-Pilot, but specifically for a small language called Moonscript. The project utilizes open-source models and datasets. It trains an EleutherAI model using low-rank adaptation and parameter-efficient fine-tuning. This model is then quantized to run as an 8-bit model, suitable for smaller systems.
Mini Redis in C++
C++
Threads
Non Blocking IO
Syscalls
Hash tables
Serialisation
AVL Trees
Heaps
An implementation of an in-memory key-value store. It covers some aspects such as interacting with the kernel through syscalls, setting up a server capable of handling larger numbers of connections, and addressing the memory management needs of the hash table.
LLM assisted streamed audio summarizer
Python
LLM's
AWS
Sockets
React
I built this tool before the release of Whisper. It utilizes AWS Transcribe to stream audio from a microphone and transcribe it. The transcribed text is then passed to a fine-tuned language model (LLM) to generate a summarized version of the audio. This tool is helpful for obtaining conversation summaries while the conversations are ongoing. Everything is tied together in a React UI.
Akatosh API
Typescipt
Type ORM
Mocha
Chai
Stryker
Winston
AWS Cloud watch
Docker
Elastic beanstalk
It involves simple CRUD operations in TypeScript, connecting with a database, and handling data processing. It is used to deliver the main dataset utilized in Akatosh.
Automated machine learning model for the detection of outlier car prices
Python
Pandas
Machine learning
Scraping
Data processing
Scaling
Skew resolution
CRON
Beautiful soup
It involves a machine learning model for detecting cars listed on autotrader.ie/autotrader.co.uk with lower-than-expected pricing. The model is trained on public car sales datasets. A cron job is set up for automated scraping of car data. It sends email notifications when a car matching specific criteria is found.
LLM assisted PDF to Anki Flashcard tool
Python
Streamlit
React
Scraping
OpenAI LLM's
Fine tuning
I built for personal use, for studying etc. It features a simple UI for uploading PDFs. It uses a fine-tuned language model (LLM) from OpenAI to generate flashcards for selected sections in the PDF. Users can choose the best flashcards from the generated list, which are then automatically added to an Anki deck for spaced repetition learning. There is a possibility to add more features, such as vector databases for image recognition, and implementing reinforcement learning with human feedback (RLHF) based on the deck's card performance.
Simple HTTP proxy server
Python
Fast API
Redis
http
caching
monitoring
This is a simple proxy server that routes http requests. The proxy implements a cache with redis for pages a user has already visited. For https connections it does a pass through. Also does some things like monitoring latency and requests going through the proxy.