👋 About Me

I am a Senior Data Scientist at Optum (UnitedHealth Group), working at the intersection of Large Language Models (LLMs), healthcare systems, and enterprise AI.
My work focuses on transforming general-purpose language models into clinically specialized systems that operate under real-world constraints such as interpretability, regulatory safety, workflow alignment, and domain knowledge grounding.

Occasionally, I journal and reflect on research, work, and life on Substack. If you’re curious about the journey behind the work, you’re welcome to read along. ✦


What I Do

🧠 Healthcare-Focused Language Models

I design and deploy domain-adapted LLMs that support:

  • Automated medical coding and clinical documentation intelligence
  • Long-sequence chart and encounter reasoning
  • Improved provider–patient communication understanding
  • Evaluation and safety frameworks aligned with clinical correctness

☁️ Large-Scale Training & Deployment

I build and maintain multi-cloud ML infrastructure across:

  • AWS SageMaker and Azure AI Foundry
  • Distributed GPU clusters (e.g., NC80adis_H100_v5)
  • Containerized CI/CD workflows for model training and inference

This work spans large-scale data ingestion, preprocessing, fine-tuning, versioning, benchmarking, and production deployment.


Professional Experience

I have worked across industry, research, and academia, building machine learning systems and guiding others in how to think and experiment in this space.

  • 🎓 Computer Science Lecturer — UNCG
    • Taught Systems Programming, Advanced Data Structures, and Data Science
    • Mentored students in research thinking and applied experimentation
    • Advised students on research and academic development.
  • 🧪 - Data Scientist — DevResonance Ltd.
    • Worked on public health and social impact analytics
    • Built end-to-end machine learning workflows using Python, PyTorch, scikit-learn
    • Developed interactive analysis dashboards with Streamlit and Plotly
    • Collaborated with global health organizations on data-driven decision systems
  • 🛠️ Data Science Intern — Redgreen Corporation
    • Applied statistical modeling and predictive analytics to support business operations
    • Worked with feature engineering, regression pipelines, and exploratory analytics
    • Contributed to internal dashboards for business insight and product decision-making

These experiences collectively shaped how I approach problem formulation, data reasoning, model alignment with real-world constraints, and measurable impact.


Academic Background

I completed my M.Sc. in Computer Science at the University of North Carolina at Greensboro (UNCG), where I worked as a graduate researcher in the IFFS-ML Lab under Dr. Shan Suthaharan.
My thesis — LDEB: Label Digitization with Emotion Binarization — proposed a structured encoding approach for emotion recognition in conversational dialogues, enabling more stable learning in label-sparse environments.

Before that, I earned my B.Sc. in Computer Science & Engineering from BRAC University, where I conducted research under Dr. Amitabha Chakrabarty.
My undergraduate thesis — Fake News Pattern Recognition using Linguistic Analysis — explored how linguistic cues can reveal author bias and deceptive intent in political social media discourse.


Research Themes

  • 🏥 Healthcare LLMs: domain adaptation aligned with clinical language + workflows
  • 📝 Long-Sequence Modeling: reasoning across multi-note patient histories
  • 🔄 Model Adaptation Pipelines: embedding spaces, structured prompting, evaluation
  • 🎛️ Past Work: multimodal misinformation detection, conversational AI, cross-lingual adaptation

Publications

  • LDEB — Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues

    2023 • arXiv

    Paper

    Abstract
    Emotion recognition in conversations (ERC) is vital to the advancements of conversational AI and its applications. Therefore, the development of an automated ERC model using machine learning (ML) is beneficial. However, conversational dialogues present nested emotions that entangle emotional descriptors with the emotion label. LDEB resolves this through digitization and binarization, enabling more meaningful model training. We evaluate the proposed method using hierarchical RF and ANN models on the FETA-DailyDialog dataset, demonstrating promising accuracy and precision.

    Citation: Dey, A., & Suthaharan, S. (2023). LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues. arXiv:2306.02193.

  • Fake News Pattern Recognition using Linguistic Analysis

    2018 • ICIEV & icIVPR, Kitakyushu, Japan

    Paper

    Abstract
    This work presents a framework for detecting deception and bias in political social media content. Using linguistic normalization, feature extraction, pattern recognition, and k-nearest neighbor classification on tweets related to the 2016 election, the study demonstrates how automated linguistic analysis can support fake news detection and evaluation.

    Citation: A. Dey, R. Z. Rafi, S. H. Parash, S. K. Arko and A. Chakrabarty (2018). Fake News Pattern Recognition using Linguistic Analysis. ICIEV & icIVPR.