Hello, I am

Najaf Murtaza

Data Scientist & AI Engineer

Specializing in Natural Language Processing, Machine Learning, and Deep Learning infrastructure to build intelligent systems.

Let's Connect

About Me

A Data Scientist currently working at Turing. Leading R&D initiatives focused on improving Large Language Models through structured evaluation, ablation analysis, and high-quality data generation. Building automated model retraining workflows and scalable AI systems that enhance accuracy, efficiency, and user satisfaction. Previously worked at Kyndryl (formerly IBM), developing ML/NLP solutions for intelligent ticket routing, spam detection, and large-scale data processing systems that reduced operational costs and improved resolution time.

Experience

March 2024 to Present

Data Scientist

Turing

Directed a team in solving complex NP-hard optimization problems using systematic methodologies and Large Language Model (LLM)-driven development frameworks; ensured production-quality implementations with efficient algorithms, clean architecture, unit/integration test suites, and strict coding standards compliance.
Applied rigorous ablation analysis during AI model development to quantify feature importance and training strategy effectiveness, resulting in streamlined models with improved accuracy and computational efficiency.
Built and maintained reusable Pulumi-based IaC modules to standardize infrastructure provisioning, improving environment consistency, reducing configuration drift, and accelerating deployment cycles.
Built an automated model retraining workflow that ingests new data, monitors distribution shifts, and updates models to adapt to changing data patterns, enhancing performance and reducing manual intervention.
I spearheaded efforts to compare AI completions, providing logical explanations for ratings, and meticulously crafting ideal responses.
Developed an AI-powered comment moderation and insight platform leveraging NLP to detect abusive language, summarize long threads, and extract emerging trends, improving moderation efficiency and community engagement.
Engineered adversarial and real-world task prompts in Python, data processing, and machine learning workflows to benchmark model limitations; applied structured human-feedback evaluation techniques to assess and rank multi-candidate outputs.
Created domain-diverse stress tests to evaluate LLM reasoning in coding and analytical tasks, leveraging structured RLHF evaluation protocols to score, compare, and refine multi-output generations.
Generated data to enable an AI system to effectively utilize a code execution environment, ensuring it can execute the appropriate code based on user prompts and deliver accurate, context-specific responses within the conversation.

September 2021 to March 2024

Data Scientist (Contractor)

Kyndryl

Part of a system design team to upgrade current product into scalable containerized micro-services which resulted in high efficiency and quick development/improvement of independent features.
Developed a distributed data processing pipeline using Apache Kafka for ingesting raw event streams and ElasticSearch for cleaned, structured indexing, enabling reliable data transformation and fast downstream querying.
Working with cross functional teams to ensure robust product development and client satisfaction.
Engaging with clients to get feedbacks and understand their requirements.
Automating AI pipelines such that they can adjust to data shift and learn new patterns quickly.

August 2019 to August 2021

Data Scientist (Contractor)

IBM

Using state of the art tools, techniques and algorithms to develop and improve Automated Ticket Routing and Assignment.
Developed a binary text classifier to detect spam tickets which saves money and time to resolve those tickets.
Implemented different clustering techniques to generate an alert on similar problems/incidents based on time on text data.
Updated word embeddings for text data which increased performance by 15% and were 5 times less in length than previous one.
Improving performance of existing Machine/Deep Learning systems and ensuring they are production ready.
Developing scripts to automate the analysis, cleaning and visualization of text data.
Verifying security issues while developing systems and ensuring it passes pen testing phase.
Developed set of useful practices for imbalanced datasets.

April 2019 to July 2019

Graduate Analytics Intern

IBM

Performing Analysis & Manipulation of datasets
Developing, Executing & Validating Machine Learning Models
Reporting key findings & model performance on daily/weekly basis
Improved the performance & bug fixes in existing scripts

Education

2014 — 2019

Bachelor of Science in Computer Science

FAST - National University of Computer and Emerging Sciences

Specialized in Deep Learning and Data Analysis. Built foundational expertise in neural networks (CNNs, RNNs, LSTMs), natural language processing, and data engineering using industry-standard Python ecosystems (TensorFlow, PyTorch, Pandas).

Skills & Tech Stack

Deep Learning & NLP

Large Language Models, BERT, Transformers (Attention), Word Embeddings (Word2Vec, FastText), CNN/RNN architectures, GANs, and IBM Watson.

Machine Learning

Predictive Modeling, Clustering (K-Means, DBSCAN), Dimensionality Reduction (PCA, t-SNE), Ensembles (Random Forests), and SVMs.

Python Ecosystem

PyTorch, TensorFlow, Keras, FastAI, Scikit-Learn, Pandas, NumPy, and data scraping utilities (Scrapy, BeautifulSoup).

Cloud & Data Engineering

Google Cloud Platform (GCP), Apache Kafka streams, Ubuntu/Linux server administration, and Jupyter environments.

Certifications & Accomplishments

Natural Language Processing Nanodegree

Udacity | June 2020

See credential →

Deep Learning Nanodegree

Udacity | January 2020

See credential →

Deep Learning Specialization

Coursera | August 2018

See credential →

Top Performer Award

Kyndryl | September 2022

Top performer award for the year 2021-2022.

View post →

Udacity Nanodegree Scholarship

Facebook | September 2019

I was among top 200 contestants out of 10,000 who were selected for 100% scholarship to Deep Learning Nanodegree.

Contact

Get In Touch

Interested in collaborating or have a question about AI and Data Science? Drop a message.