NLP Basics Comprehensive Quiz & Projects

30 questions on NLP Basics Tutorial.

Question 1: What is the difference between Stemming and Lemmatization in text preprocessing?

A. Stemming is used for semantic translation, while Lemmatization is used for syntax parsing.
B. Stemming cuts off suffixes crudely (e.g. 'studies' -> 'studi'), while Lemmatization returns the dictionary lemma (e.g. 'studies' -> 'study') using vocabulary analysis. — (correct answer)
C. Stemming works on sentences, while Lemmatization works only on characters.
D. Stemming is faster and always yields grammatically correct base words.

Explanation: Lemmatization uses grammatical rules and dictionaries to find the proper base word. Stemming applies simple, heuristic chops.

Question 2: In Term Frequency-Inverse Document Frequency (TF-IDF), what does the IDF component measure?

A. The number of times a target term appears in a single document.
B. The grammatical complexity of the text.
C. The importance or rarity of a word across the entire corpus of documents. — (correct answer)
D. The length of the document.

Explanation: Eager loading (using the 'with' method) fetches related data using a single query with an IN clause, preventing N additional queries.

Question 3: How do Word Embeddings (like Word2Vec) capture semantic meaning in vector space?

A. By mapping each word to a unique random integer value.
B. By positioning words that appear in similar contexts close to each other in high-dimensional vector space. — (correct answer)
C. By alphabetical sorting matrices.
D. By hashing words into 128-bit binary signatures.

Explanation: Word embeddings represent words as vectors. The distance and angle between vectors represent semantic and contextual relationships.

Question 4: What limitation of Recurrent Neural Networks (RNNs) led to the development of Transformer architectures?

A. RNNs do not support training on text documents.
B. RNNs struggle with long-term dependencies due to the vanishing gradient problem and process tokens sequentially, blocking parallel training. — (correct answer)
C. RNNs require high GPU memory for basic tokenization.
D. RNNs cannot be used for translation tasks.

Explanation: Transformers process whole sequences at once (parallelization) and use self-attention to link words over infinite context distances.

Question 5: In Transformer models, what is the purpose of the Self-Attention mechanism?

A. To monitor model accuracy during validation loops.
B. To prioritize inputs from the system prompt over user files.
C. To calculate how much focus or weight one token should place on every other token in the sequence when encoding meaning. — (correct answer)
D. To clean spelling errors in raw input strings.

Explanation: Self-attention calculates correlation scores between all words in a sentence, capturing context dynamically (e.g. 'bank' of a river vs financial 'bank').

Question 6: What is Tokenization in NLP?

A. Converting script files to binary formats.
B. The process of splitting a continuous string of text into individual units (tokens) like words or subwords. — (correct answer)
C. Checking grammatical errors.
D. Generating security keys for APIs.

Explanation: Tokenization is the foundation step, transforming raw sentences into model-parseable arrays.

Question 7: What are 'Stop Words' in text processing?

A. Words that trigger syntax errors.
B. Common words (like 'and', 'the', 'is') that are often filtered out before processing because they carry little semantic value. — (correct answer)
C. Key vocabulary words in documents.
D. Commands that stop text parsers.

Explanation: Removing stop words reduces vocabulary noise, allowing algorithms to focus on content words.

Question 8: What is a Named Entity Recognition (NER) task?

A. Naming new variables in code files.
B. Identifying and classifying key entities in text into predefined categories (e.g. Names, Dates, Organizations). — (correct answer)
C. Parsing sentences to identify parts of speech.
D. Translating words to different languages.

Explanation: NER models scan inputs to extract structure (e.g., extracting 'Apple' as an Organization).

Question 9: In NLP, what is a 'Corpus'?

A. A text compiler software.
B. A large, structured collection of text documents used for training and linguistic analysis. — (correct answer)
C. A database table schema.
D. The body of a single function.

Explanation: A corpus represents the dataset of text documents used to train language models.

Question 10: What does a Part-of-Speech (POS) Tagger do?

A. Compiles text stylesheets.
B. Analyzes words in a sentence and labels their grammatical class (noun, verb, adjective) based on context. — (correct answer)
C. Searches documents for target terms.
D. Translates sentences into binary code.

Explanation: POS tags map syntactic roles, crucial for understanding sentence patterns.

Question 11: What is the difference between Word2Vec's CBOW (Continuous Bag of Words) and Skip-gram architectures?

A. CBOW is used only for database tables.
B. CBOW predicts the target word from surrounding context, while Skip-gram predicts the surrounding context from a target word. — (correct answer)
C. Skip-gram is slower and deprecated.
D. CBOW works only on character tokens.

Explanation: CBOW is faster and works well for frequent words; Skip-gram performs better on rare words.

Question 12: What are N-grams in text mining?

A. Mathematical formulas for measuring file sizes.
B. Contiguous sequences of N items (words or characters) from a given sample of text. — (correct answer)
C. The number of layers in a neural net.
D. Key-value indices.

Explanation: Unigrams (1-grams), Bigrams (2-grams), and Trigrams (3-grams) capture local sequences.

Question 13: How is Cosine Similarity used in NLP?

A. To check text file compression.
B. To measure the semantic similarity between two word or document vectors by calculating the cosine of the angle between them. — (correct answer)
C. To encrypt token streams.
D. To route API requests.

Explanation: Value 1 indicates identical vector directions; 0 indicates orthogonal/unrelated meanings.

Question 14: What does the BLEU score evaluate in NLP applications?

A. The processing latency of translators.
B. The quality of machine-translated text by comparing it against human reference translations. — (correct answer)
C. The storage capacity of text tables.
D. The count of spelling errors.

Explanation: BLEU measures n-gram overlaps between model output and human reference standards.

Question 15: What is the purpose of Text Normalization?

A. Sorting files alphabetically.
B. Standardizing text inputs (e.g., converting to lowercase, removing punctuation, expanding contractions) to reduce vocabulary variants. — (correct answer)
C. Validating string lengths.
D. Translating words to English.

Explanation: Normalization ensures 'Car', 'car', and 'car!' map to the identical token key.

Question 16: Which task involves determining the emotional tone behind a body of text (e.g. positive, negative, neutral)?

A. Language Modeling
B. Sentiment Analysis — (correct answer)
C. Syntax Parsing
D. Tokenization

Explanation: Sentiment analysis classifies subjectivity and emotion, heavily used in customer reviews.

Question 17: How do Recurrent Neural Networks (RNNs) capture sequential context in text?

A. By processing words in parallel.
B. By passing a hidden state vector forward through time steps, carrying memory of previous tokens. — (correct answer)
C. By storing strings in database tables.
D. By compiling text into binary blobs.

Explanation: The hidden state acts as memory, updating at each token step to capture context.

Question 18: What is a 'Lexicon' in linguistic processing?

A. A text compiling program.
B. A dictionary containing vocabulary words and their associated properties or sentiment scores. — (correct answer)
C. A database table schema.
D. An API routing controller.

Explanation: Lexicons store word properties (e.g., a lexicon of positive sentiment terms).

Question 19: What does the 'attention weight' represent in Seq2Seq models?

A. The memory speed of GPU grids.
B. A score indicating how much attention the decoder should pay to specific encoder input tokens when generating a target output token. — (correct answer)
C. The priority of the system prompt.
D. The size of the vocabulary.

Explanation: Attention allows the model to focus on relevant context words dynamically during generation.

Question 20: In NLP, what does 'Bag of Words' (BoW) represent?

A. A folder containing text documents.
B. A simple text representation that counts word frequencies, ignoring grammatical structure and word order. — (correct answer)
C. A list of stop words.
D. An encryption format for strings.

Explanation: BoW creates a vocabulary list, counting occurrences without caring about sequence layouts.

Question 21: What does Dependency Parsing accomplish?

A. It checks dependencies in package.json files.
B. It maps the grammatical relationships between words in a sentence, establishing head-dependency trees. — (correct answer)
C. It groups documents by topic.
D. It translates words.

Explanation: Dependency trees show how verbs, nouns, and adjectives relate structurally.

Question 22: What is the difference between character-level and word-level tokenization?

A. Character tokenization is synchronous, word is asynchronous.
B. Character tokenization splits text into individual letters/symbols, reducing out-of-vocabulary terms but increasing sequence lengths. — (correct answer)
C. Word tokenization is deprecated.
D. Character tokenization is only used for databases.

Explanation: Subword tokenization (Byte-Pair Encoding) sits in between, balancing vocabulary size and length.

Question 23: What is a 'Stop Word List'?

A. A list of restricted database commands.
B. A pre-compiled list of common words to be ignored during text processing tasks. — (correct answer)
C. An index of document slugs.
D. A list of system configuration keys.

Explanation: Stop word lists hold language-specific filler words (e.g., 'a', 'in', 'on').

Question 24: What is Language Modeling?

A. Programming models to write code.
B. The task of predicting the probability of a sequence of words (or predicting the next word in a sequence). — (correct answer)
C. Translating text between domains.
D. Checking grammar in documents.

Explanation: Language models learn probability distributions over word sequences (next-token prediction).

Question 25: What is the purpose of Word Sense Disambiguation (WSD)?

A. Correcting spelling errors in strings.
B. Identifying which semantic meaning of a word is intended based on the surrounding context (e.g. 'bass' fish vs 'bass' instrument). — (correct answer)
C. Compacting vocabulary arrays.
D. Encrypting token payloads.

Explanation: WSD resolves word ambiguities by parsing surrounding context vectors.

Question 26: What does 'NLP' stand for?

A. Network Layer Protocol
B. Natural Language Processing — (correct answer)
C. Numerical Log Parser
D. Node Loop Process

Explanation: Natural Language Processing combines CS and linguistics to enable computers to process human language.

Question 27: How does Byte-Pair Encoding (BPE) build a subword vocabulary?

A. By mapping characters to random integer numbers.
B. By iteratively merging the most frequent pairs of bytes or characters in a text corpus. — (correct answer)
C. By checking dictionary definitions.
D. By encrypting string payloads.

Explanation: BPE builds subwords dynamically, letting tokenizers handle unknown words gracefully.

Question 28: What does a perplexity score measure in language models?

A. The processing latency of queries.
B. How well a probability model predicts a sample text (lower perplexity represents better prediction accuracy). — (correct answer)
C. The number of layers in a neural net.
D. The vocabulary array size.

Explanation: Perplexity is the exponentiated cross-entropy loss, indicating token predictability.

Question 29: What is the difference between extractive and abstractive summarization?

A. Extractive copy-pastes key sentences from the source directly, while Abstractive paraphrases and generates new sentences to summarize the text. — (correct answer)
B. Abstractive is faster and uses less CPU memory.
C. Extractive is unsupervised, while Abstractive is supervised.
D. Extractive only works on database tables.

Explanation: Abstractive models require generative architectures to write new summaries.

Question 30: Which module is a popular Python library for standard NLP tasks like tokenization and parsing?

A. NumPy
B. NLTK (or spaCy) — (correct answer)
C. PyTorch
D. Flask

Explanation: NLTK and spaCy are standard libraries for natural language processing.

NLP Basics Comprehensive Quiz & Projects

Test your knowledge with interactive questions.

Ready to test your skills?

You are about to start a comprehensive quiz containing questions covering NLP Basics Tutorial. You have 30 minutes to complete it.

Discussion (0)

Comments are reviewed before appearing.

No comments yet — be the first!

Featured

Browse All 21+ Subject Areas

Popular Topics

More Topics

Quick Links

Featured

Visual Algorithm Labs

Sorting Algorithms

Data Structures

Featured

Frontend Dev

Career Paths

Skill Tracks

Featured

The Future of Web Architecture in 2026

Categories

Community

Practice Quizzes

NLP Basics Comprehensive Quiz & Projects

Ready to test your skills?

Correct!

Incorrect

Explanation

Quiz Navigator /

Current Score

Discussion (0)

Send Feedback / Bug

Feedback Submitted!