Suraj Thapa

Interests

Machine Learning MLOps LLMs/RAGs/Fine tuning LLMs/NLP/ Computer Vision

Skills & Expertise

Languages

Python Bash Scala R

LLM Experiences

Retrieval-augmented generation (RAG) Fine tuning OpenAI Cohere Gemini Agent Builder Weaviate GPT APIs Langchain Streamlit LLamaindex Huggingface

Database

PostgreSQL (pgvector) Weaviate MySQL CrateDB MongoDB

AWS

AWS Bedrock AWS Sagemaker

GCP

Document AI Vertex AI Agent Builder

Git

GitHub GitLab CI/CD pipelines

Infrastructure as Code

Terraform Cloudformation

Observability Tools

Sentry New Relic Papertrail

Containerization

Docker Kubernetes ECS EKS

Other Experiences

scikit-learn pyTorch NumPy Pandas Linux Ansible Docker Web Kafka RabbitMQ AirFlow Databricks Web scraping Jira

Education

University of Denver, Denver, CO

M.A., Global Economic Affairs
2018 - 2021

Key Courses:

University of Idaho, Moscow, ID

B.A., Bachelor in Economics; Minor in Math and Statistics
2015 - 2018

Key Courses:

Professional Experience

Machine Learning Engineer

The Texas Tribune · www.texastribune.org
2022 - Present

  • Developed AI-powered chatbots adhering to journalistic standards, ensuring accuracy and minimizing hallucinations.
  • Implemented hybrid search systems using PostgreSQL for internal applications, integrating semantic, fuzzy, and full-text search for Retrieval-Augmented Generation (RAG).
  • Designed a cross-encoder re-ranker for RAG applications, significantly improving document retrieval accuracy and relevance.
  • Fine-tuned transformer models (e.g., BERT for Named Entity Recognition) from Hugging Face for internal applications.
  • Designed and managed scalable ML infrastructure for model training, testing, and deployment in cloud-native environments.
  • Created automated CI/CD pipelines for ML workflows, including data ingestion, feature engineering, model training, validation, and deployment on AWS and GCP.
  • Established observability frameworks for end-to-end monitoring of model performance, data quality, and system reliability.
  • Implemented infrastructure-as-code (IaC) solutions using Terraform and CloudFormation to maintain reproducible, version-controlled production environments.

Data Engineer II

Lightcast · lightcast.io
2021 - 2022

  • Built Machine Learning Models to classify web pages to enhance web scraping capabilities
  • Implemented and maintained GitLab CI/CD workflows and Code Deploy (AWS) using container services in GitLab
  • Performed data migration from Crate database to AWS RDS, PostgreSQL
  • Utilized docker to build and deploy applications