Hello, I am
Najaf Murtaza
Data Scientist & AI Engineer
Specializing in Natural Language Processing, Machine Learning, and Deep Learning infrastructure to build intelligent systems.
About Me
A Data Scientist currently working at Turing. Leading R&D initiatives focused on improving Large Language Models through structured evaluation, ablation analysis, and high-quality data generation. Building automated model retraining workflows and scalable AI systems that enhance accuracy, efficiency, and user satisfaction. Previously worked at Kyndryl (formerly IBM), developing ML/NLP solutions for intelligent ticket routing, spam detection, and large-scale data processing systems that reduced operational costs and improved resolution time.
Experience
Data Scientist
Turing
- Directed a team in solving complex NP-hard optimization problems using systematic methodologies and Large Language Model (LLM)-driven development frameworks; ensured production-quality implementations with efficient algorithms, clean architecture, unit/integration test suites, and strict coding standards compliance.
- Applied rigorous ablation analysis during AI model development to quantify feature importance and training strategy effectiveness, resulting in streamlined models with improved accuracy and computational efficiency.
- Built and maintained reusable Pulumi-based IaC modules to standardize infrastructure provisioning, improving environment consistency, reducing configuration drift, and accelerating deployment cycles.
- Built an automated model retraining workflow that ingests new data, monitors distribution shifts, and updates models to adapt to changing data patterns, enhancing performance and reducing manual intervention.
- I spearheaded efforts to compare AI completions, providing logical explanations for ratings, and meticulously crafting ideal responses.
- Developed an AI-powered comment moderation and insight platform leveraging NLP to detect abusive language, summarize long threads, and extract emerging trends, improving moderation efficiency and community engagement.
- Engineered adversarial and real-world task prompts in Python, data processing, and machine learning workflows to benchmark model limitations; applied structured human-feedback evaluation techniques to assess and rank multi-candidate outputs.
- Created domain-diverse stress tests to evaluate LLM reasoning in coding and analytical tasks, leveraging structured RLHF evaluation protocols to score, compare, and refine multi-output generations.
- Generated data to enable an AI system to effectively utilize a code execution environment, ensuring it can execute the appropriate code based on user prompts and deliver accurate, context-specific responses within the conversation.
Data Scientist (Contractor)
Kyndryl
- Part of a system design team to upgrade current product into scalable containerized micro-services which resulted in high efficiency and quick development/improvement of independent features.
- Developed a distributed data processing pipeline using Apache Kafka for ingesting raw event streams and ElasticSearch for cleaned, structured indexing, enabling reliable data transformation and fast downstream querying.
- Working with cross functional teams to ensure robust product development and client satisfaction.
- Engaging with clients to get feedbacks and understand their requirements.
- Automating AI pipelines such that they can adjust to data shift and learn new patterns quickly.
Data Scientist (Contractor)
IBM
- Using state of the art tools, techniques and algorithms to develop and improve Automated Ticket Routing and Assignment.
- Developed a binary text classifier to detect spam tickets which saves money and time to resolve those tickets.
- Implemented different clustering techniques to generate an alert on similar problems/incidents based on time on text data.
- Updated word embeddings for text data which increased performance by 15% and were 5 times less in length than previous one.
- Improving performance of existing Machine/Deep Learning systems and ensuring they are production ready.
- Developing scripts to automate the analysis, cleaning and visualization of text data.
- Verifying security issues while developing systems and ensuring it passes pen testing phase.
- Developed set of useful practices for imbalanced datasets.
Graduate Analytics Intern
IBM
- Performing Analysis & Manipulation of datasets
- Developing, Executing & Validating Machine Learning Models
- Reporting key findings & model performance on daily/weekly basis
- Improved the performance & bug fixes in existing scripts
Education
Bachelor of Science in Computer Science
FAST - National University of Computer and Emerging Sciences
Specialized in Deep Learning and Data Analysis. Built foundational expertise in neural networks (CNNs, RNNs, LSTMs), natural language processing, and data engineering using industry-standard Python ecosystems (TensorFlow, PyTorch, Pandas).
Skills & Tech Stack
Deep Learning & NLP
Large Language Models, BERT, Transformers (Attention), Word Embeddings (Word2Vec, FastText), CNN/RNN architectures, GANs, and IBM Watson.
Machine Learning
Predictive Modeling, Clustering (K-Means, DBSCAN), Dimensionality Reduction (PCA, t-SNE), Ensembles (Random Forests), and SVMs.
Python Ecosystem
PyTorch, TensorFlow, Keras, FastAI, Scikit-Learn, Pandas, NumPy, and data scraping utilities (Scrapy, BeautifulSoup).
Cloud & Data Engineering
Google Cloud Platform (GCP), Apache Kafka streams, Ubuntu/Linux server administration, and Jupyter environments.
Certifications & Accomplishments
Natural Language Processing Nanodegree
Udacity | June 2020
Deep Learning Nanodegree
Udacity | January 2020
Deep Learning Specialization
Coursera | August 2018
Top Performer Award
Kyndryl | September 2022
Top performer award for the year 2021-2022.
View post →Udacity Nanodegree Scholarship
Facebook | September 2019
I was among top 200 contestants out of 10,000 who were selected for 100% scholarship to Deep Learning Nanodegree.
Contact
Get In Touch
Interested in collaborating or have a question about AI and Data Science? Drop a message.