I'm a data scientist with a PhD from the University of Alberta and 7+ years of experience applying machine learning, NLP, and statistical modeling to complex, real-world problems. My doctoral research focused on how developers interact with data science library APIs and how to improve code reliability in data-science workflows — work that combined rigorous empirical methodology with LLM-based solutions and large-scale data pipelines. I'm actively seeking senior data scientist roles where I can bring that foundation into industry — designing production-grade ML systems, running principled experiments, and translating messy, high-dimensional data into decisions that matter. I care about building things that are both statistically sound and practically deployable.
Projects
Credit Risk Decision System – End-to-End ML Service
GitHub: github.com/boneyag/CreditRisk
Production-grade ML system applying rigorous statistical methodology to real-world credit adjudication – from raw data through containerized REST API deployment
- Designed and validated a challenger model (XGBoost vs. logistic regression baseline) using rigorous hypothesis testing: McNemar's test yielded p = 4.1e-38 and bootstrap CIs on accuracy delta confirmed a statistically robust +3.7% gain.
- Engineered a modular, containerised training pipeline (Docker Compose) with decoupled training/serving stages, versioned artifact manifests, and structured JSON audit logging on every prediction – mirroring production ML deployment practices.
- Built a FastAPI REST service with automated endpoint tests (pytest) and Pydantic schema validation, achieving environment parity between training and inference containers.
- Optimised model for risk-centric recall on high-default applicants (F1-score: 0.92), translating business objectives into metric selection decisions rather than defaulting to accuracy – consistent with real credit risk constraints.
- Extending system with SHAP-based explanations per prediction for adjudicator transparency, and AWS Fargate deployment with Prometheus monitoring for real-time feature drift detection.
Technical Skills
Data Analysis & Querying
SQL
CTEs
Window Functions
Subqueries
Python
pandas
NumPy
Excel
Statistical Analysis & Experimentation
Hypothesis Testing
A/B Testing
Predictive Modeling
Cluster Analysis
Model Validation
R
SciPy
scikit-learn
statsmodels
BI & Visualization
Tableau
Power BI
Looker
Excel Dashboards
Matplotlib
Plotly
Jupyter Notebook
Databases & Data Tools
PostgreSQL
MySQL
SQLAlchemy
dbt
Analytics Workflow & Engineering
Data Cleaning
Data Validation
EDA
Reproducible Research
Collaboration & Delivery
Git
GitHub
Agile
Scrum
Sprint Planning
Kanban
Professional Experience
Software Engineering Researcher (Data Analysis & ML)
University of Alberta (2020 - 2025)
GitHub: github.com/boneyag/DSChecker
- Large-scale Data Analysis: Built automated data pipelines using APIs and web scraping to collect and analyze large datasets from GitHub and Stack Overflow, enabling data-driven analysis of software usage patterns and developer behavior.
- Analytical Model Development: Developed a data-driven error detection and resolution system that improved detection accuracy by 22% and solution effectiveness by 63% compared to baseline approaches.
- Statistical Evaluation & Validation: Designed evaluation frameworks using hypothesis testing, controlled experiments, and statistical performance metrics to assess model reliability and effectiveness.
- Insight Generation: Analyzed large datasets to identify recurring patterns and root causes of technical issues, translating findings into actionable insights for improving software tools and developer workflows.
- Cross-Functional Collaboration: Worked closely with research collaborators and developers to refine analytical models and communicate results through reports and presentations.
- Data Visualization & Reporting: Developed dashboards and visualizations to communicate analytical findings and trends to collaborators and decision-makers.
Data Analytics Consultant
Triggericon (2024 - 2025)
- Designed and executed A/B experiments to evaluate the relative impact of Google Ads vs. social media campaigns – including hypothesis framing, alpha/beta calibration, and sample size planning – delivering a statistically grounded channel recommendation to stakeholders.
- Built a reusable Python sample size planning tool, enabling the team to independently scope future experiments without analyst involvement for routine campaign decisions.
- Analyzed consumer engagement data from Google Analytics across click-through rates, cost-per-click, and conversion rates, identifying optimization opportunities that informed campaign budget adjustments.
- Developed an executive revenue dashboard tracking total revenue, month-over-month growth, cost-vs-revenue by channel, and new vs. returning customer segmentation.
Software Engineering Researcher (Data Analysis & NLP)
University of Lethbridge (2018 - 2020)
GitHub: github.com/boneyag/TOBE
- Predictive Modeling: Developed supervised machine learning models for text analysis and summarization, improving performance by 23% over baseline approaches.
- Dataset Development & Data Quality: Led large-scale data annotation and validation initiatives to build a high-quality dataset.
- Feature Engineering & Data Representation: Designed structured data representations (TF-IDF, word embeddings) to improve analytical model accuracy and interpretability.
- Model Evaluation & Reporting: Assessed models using statistical metrics including precision, recall, and F1-score to ensure reliable and reproducible results.
Technical Lead & Project Coordinator
University of Alberta (2020 - 2025)
- Agile Project Leadership: Mentored 20+ software development teams through the full project lifecycle using Agile practices, including sprint planning, scrum, and Kanban boards.
- Stakeholder Communication: Acted as liaison between student teams and course instructors, helping clarify project requirements and ensure alignment with objectives.
- Technical Guidance: Provided guidance on system design, databases, version control, and testing practices while conducting code reviews to maintain quality standards.
- Project Monitoring & Delivery: Tracked project progress, resolved development challenges, and supported teams in delivering projects on schedule.
Education
PhD in Computer Science (2025)
University of Alberta, Canada
MSc in Computer Science (2020)
University of Lethbridge, Canada
MPhil in Computer Science (2018)
University of Peradeniya, Sri Lanka
BSc in Computer Science (2010)
University of Peradeniya, Sri Lanka
Selected Publications
-
Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models
Akalanka Galappaththi, Francisco Ribeiro, Sarah Nadi
35th IEEE International Conference on Collaborative Advances in Software and Computing (CASCON), 2025 [arxiv]
-
An Empirical Study of API Misuses of Data-Centri Libraries
Akalanka Galappaththi, Sarah Nadi, Christoph Treude
18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2024 [paper]
-
Does This Apply to Me? An Empirical Study of Technical Context in Stack Overflow
Akalanka Galappaththi, Sarah Nadi, Christoph Treude
19th International Conference in Mining Software Repositories (MSR), 2022 [paper]
-
Automatically Annotating Sentences for Task-specific Bug Report Summarization
Akalanka Galappaththi, John Anvik, Rafat Islam
36th International Conference on Automated Software Engineering (ASE), 2021 [paper]
Recognition
Services
-
Program committee member - Tool Demo Track (SANER '25)
-
Junior program committee member (MSR '23)
Scholarships
-
Alberta Graduate Excellence Scholarship (2024, 2021)
-
University of Alberta Doctoral Recruitment Scholarship (2020)
-
Alberta Innovates Graduate Student Scholarship (2019)
Recognition
-
Appreciation for services to CREATE SE4AI in leadership and mentorship