About me

I'm a data scientist with a PhD from the University of Alberta and 7+ years of experience applying machine learning, NLP, and statistical modeling to complex, real-world problems. My doctoral research focused on how developers interact with data science library APIs and how to improve code reliability in data-science workflows — work that combined rigorous empirical methodology with LLM-based solutions and large-scale data pipelines. I'm actively seeking senior data scientist roles where I can bring that foundation into industry — designing production-grade ML systems, running principled experiments, and translating messy, high-dimensional data into decisions that matter. I care about building things that are both statistically sound and practically deployable.

Projects

Credit Risk Decision System – End-to-End ML Service

GitHub: github.com/boneyag/CreditRisk

Production-grade ML system applying rigorous statistical methodology to real-world credit adjudication – from raw data through containerized REST API deployment

  • Designed and validated a challenger model (XGBoost vs. logistic regression baseline) using rigorous hypothesis testing: McNemar's test yielded p = 4.1e-38 and bootstrap CIs on accuracy delta confirmed a statistically robust +3.7% gain.
  • Engineered a modular, containerised training pipeline (Docker Compose) with decoupled training/serving stages, versioned artifact manifests, and structured JSON audit logging on every prediction – mirroring production ML deployment practices.
  • Built a FastAPI REST service with automated endpoint tests (pytest) and Pydantic schema validation, achieving environment parity between training and inference containers.
  • Optimised model for risk-centric recall on high-default applicants (F1-score: 0.92), translating business objectives into metric selection decisions rather than defaulting to accuracy – consistent with real credit risk constraints.
  • Extending system with SHAP-based explanations per prediction for adjudicator transparency, and AWS Fargate deployment with Prometheus monitoring for real-time feature drift detection.

Technical Skills

Data Analysis & Querying

SQL CTEs Window Functions Subqueries Python pandas NumPy Excel

Statistical Analysis & Experimentation

Hypothesis Testing A/B Testing Predictive Modeling Cluster Analysis Model Validation R SciPy scikit-learn statsmodels

BI & Visualization

Tableau Power BI Looker Excel Dashboards Matplotlib Plotly Jupyter Notebook

Databases & Data Tools

PostgreSQL MySQL SQLAlchemy dbt

Analytics Workflow & Engineering

Data Cleaning Data Validation EDA Reproducible Research

Collaboration & Delivery

Git GitHub Agile Scrum Sprint Planning Kanban

Professional Experience

Software Engineering Researcher (Data Analysis & ML)

University of Alberta (2020 - 2025)

GitHub: github.com/boneyag/DSChecker

  • Large-scale Data Analysis: Built automated data pipelines using APIs and web scraping to collect and analyze large datasets from GitHub and Stack Overflow, enabling data-driven analysis of software usage patterns and developer behavior.
  • Analytical Model Development: Developed a data-driven error detection and resolution system that improved detection accuracy by 22% and solution effectiveness by 63% compared to baseline approaches.
  • Statistical Evaluation & Validation: Designed evaluation frameworks using hypothesis testing, controlled experiments, and statistical performance metrics to assess model reliability and effectiveness.
  • Insight Generation: Analyzed large datasets to identify recurring patterns and root causes of technical issues, translating findings into actionable insights for improving software tools and developer workflows.
  • Cross-Functional Collaboration: Worked closely with research collaborators and developers to refine analytical models and communicate results through reports and presentations.
  • Data Visualization & Reporting: Developed dashboards and visualizations to communicate analytical findings and trends to collaborators and decision-makers.

Data Analytics Consultant

Triggericon (2024 - 2025)

  • Designed and executed A/B experiments to evaluate the relative impact of Google Ads vs. social media campaigns – including hypothesis framing, alpha/beta calibration, and sample size planning – delivering a statistically grounded channel recommendation to stakeholders.
  • Built a reusable Python sample size planning tool, enabling the team to independently scope future experiments without analyst involvement for routine campaign decisions.
  • Analyzed consumer engagement data from Google Analytics across click-through rates, cost-per-click, and conversion rates, identifying optimization opportunities that informed campaign budget adjustments.
  • Developed an executive revenue dashboard tracking total revenue, month-over-month growth, cost-vs-revenue by channel, and new vs. returning customer segmentation.

Software Engineering Researcher (Data Analysis & NLP)

University of Lethbridge (2018 - 2020)

GitHub: github.com/boneyag/TOBE

  • Predictive Modeling: Developed supervised machine learning models for text analysis and summarization, improving performance by 23% over baseline approaches.
  • Dataset Development & Data Quality: Led large-scale data annotation and validation initiatives to build a high-quality dataset.
  • Feature Engineering & Data Representation: Designed structured data representations (TF-IDF, word embeddings) to improve analytical model accuracy and interpretability.
  • Model Evaluation & Reporting: Assessed models using statistical metrics including precision, recall, and F1-score to ensure reliable and reproducible results.

Technical Lead & Project Coordinator

University of Alberta (2020 - 2025)

  • Agile Project Leadership: Mentored 20+ software development teams through the full project lifecycle using Agile practices, including sprint planning, scrum, and Kanban boards.
  • Stakeholder Communication: Acted as liaison between student teams and course instructors, helping clarify project requirements and ensure alignment with objectives.
  • Technical Guidance: Provided guidance on system design, databases, version control, and testing practices while conducting code reviews to maintain quality standards.
  • Project Monitoring & Delivery: Tracked project progress, resolved development challenges, and supported teams in delivering projects on schedule.

Education

PhD in Computer Science (2025)

University of Alberta, Canada

MSc in Computer Science (2020)

University of Lethbridge, Canada

MPhil in Computer Science (2018)

University of Peradeniya, Sri Lanka

BSc in Computer Science (2010)

University of Peradeniya, Sri Lanka

Selected Publications

  • Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models
    Akalanka Galappaththi, Francisco Ribeiro, Sarah Nadi
    35th IEEE International Conference on Collaborative Advances in Software and Computing (CASCON), 2025 [arxiv]
  • An Empirical Study of API Misuses of Data-Centri Libraries
    Akalanka Galappaththi, Sarah Nadi, Christoph Treude
    18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2024 [paper]
  • Does This Apply to Me? An Empirical Study of Technical Context in Stack Overflow
    Akalanka Galappaththi, Sarah Nadi, Christoph Treude
    19th International Conference in Mining Software Repositories (MSR), 2022 [paper]
  • Automatically Annotating Sentences for Task-specific Bug Report Summarization
    Akalanka Galappaththi, John Anvik, Rafat Islam
    36th International Conference on Automated Software Engineering (ASE), 2021 [paper]

Recognition

Services

  • Program committee member - Tool Demo Track (SANER '25)
  • Junior program committee member (MSR '23)

Scholarships

  • Alberta Graduate Excellence Scholarship (2024, 2021)
  • University of Alberta Doctoral Recruitment Scholarship (2020)
  • Alberta Innovates Graduate Student Scholarship (2019)

Recognition

  • Appreciation for services to CREATE SE4AI in leadership and mentorship