🟣 MongoDB interview questions and answers to help you prepare for your next technical interview in 2024.
MongoDB is a robust, document-oriented NoSQL database designed for high performance, scalability, and developer agility.
Here is the Python code:
from pymongo import MongoClient
client = MongoClient() # Connects to default address and port
db = client.get_database('mydatabase')
# Insert a record
collection = db.get_collection('mycollection')
inserted_id = collection.insert_one({'key1': 'value1', 'key2': 'value2'}).inserted_id
# Query records
for record in collection.find({'key1': 'value1'}):
print(record)
# Update record
update_result = collection.update_one({'_id': inserted_id}, {'$set': {'key2': 'new_value'}})
print(f"Modified {update_result.modified_count} records")
# Delete record
delete_result = collection.delete_one({'key1': 'value1'})
print(f"Deleted {delete_result.deleted_count} records")
While both MongoDB and relational databases handle data, they do so in fundamentally different ways. Let's explore the key distinctions.
In MongoDB, data units are organized into collections, which group related documents. Each document corresponds to a single record and maps to fields or key-value pairs.
Data in MongoDB is stored using a BSON (Binary JSON) format that can handle a maximum depth of 100 levels. This means a BSON object or element can be a document consisting of up to 100 sub-elements, such as fields or values.
Here is a nested document:
{
"_id": "123",
"title": "My Blog Post",
"author": {
"name": "John Doe",
"bio": "Tech enthusiast"
},
"comments": [
{
"user": "Alice",
"text": "Great post"
},
{
"user": "Bob",
"text": "A bit lengthy!"
}
]
}
In the example above, the "author" field is an embedded document (or sub-document), and the "comments" field is an array of documents.
Ad-Hoc Schema: Documents in a collection don't need to have the same fields, providing schema flexibility.
Atomicity at the Document Level: The ACID
properties (Atomicity, Consistency, Isolation, Durability) of a transaction, which guarantee that the modifications are successful or unsuccessful as a unit of work.
Index Support: Increases query performance.
Support for Embedded Data: You can nest documents and arrays.
Reference Resolution: It allows for processing references across documents. If a referenced document is modified or deleted, any reference to it from another document also needs to be updated or deleted in a multi-step atomic operation.
Sharding and Replication: For horizontal scaling and high availability.
One-to-One: Typically achieved with embedded documents.
One-to-Many (Parent-Child): This can be modelled using embedded documents in the parent.
One-to-Many (Referenced): Achieved through referencing, where several documents contain a field referencing a single document. For better efficiency with frequent updates, consider referencing.
Many-to-Many: Modeled similarly to "One-to-Many" relationships.
You should avoid using “repeatable patterns”, such as storing data in separate arrays or collections, to ensure smooth data manipulation and effective query operations.
For example, using separate collections for similar types of data based on a category like "users" and "admins" instead of a single "roles" array with multiple documents.
The above best practice example prevents data redundancy and ensures consistency between similar documents. Redundant storage or separating non-redundant data can lead to inconsistencies and increase the effort required for maintenance.
In MongoDB, a document is the basic data storage unit. It's a JSON-like structure that stores data in key-value pairs known as fields.
Each document:
_id
or primary key that is indexed for fast lookups.Here is the document structure:
{
"_id": 1,
"name": "John Doe",
"age": 30,
"email": "[email protected]",
"address": {
"city": "Example",
"zip": "12345"
},
"hobbies": ["golf", "reading"]
}
Documents are grouped into collections. Each collection acts as a container with a unique namespace within a database. Collections don't enforce a predefined schema, which allows for flexibility in data modeling.
Flexibility: Documents can be tailored to the specific data needs of the application without adherence to a rigid schema.
Data Locality: Related data, like a user's profile and their posts, can be stored in one document, enhancing performance by minimizing lookups.
JSON Familiarity: Documents, being JSON-like, enable easier transitions between application objects and database entities.
Indexing: Fields within documents can be indexed, streamlining search operations.
Transaction Support: Modern versions of MongoDB offer ACID-compliant, multi-document transactions that ensure data consistency.
Consider an online library. Instead of having separate tables for users, books, and checkouts as in a relational database, you could store all the pertinent data about a user, including their checked-out books, in a single document within a users
collection:
{
"_id": 1,
"name": "John Doe",
"email": "[email protected]",
"address": { "city": "Example", "zip": "12345" },
"checkedOutBooks": [
{ "bookId": 101, "dueDate": "2022-02-28" },
{ "bookId": 204, "dueDate": "2022-03-15" }
]
}
This approach enables swift retrieval of all pertinent user information in one go.
In MongoDB, data is stored in types of collections, ensuring flexibility and efficiency in data modeling.
A MongoDB database is a document-oriented, NoSQL database consisting of collections, each of which in turn comprise documents.
Advantages of Using Collections:
Synonymous with a record, a document is the main data storage unit in MongoDB. It is a set of key-value pairs.
Document-Key Pairs:
Each document maintains a unique ID, known as the object ID which is autogenerated. This ensures every document is distinct.
Unlike SQL databases where each row of a table follows the same schema, a document can be more fluid, accommodating fields as required.
Considerations When Choosing the Level of Normalization:
Optimized Reads: Normalization into separate collections may be beneficial if there are large amounts of data that might not always have to be fetched.
Batch Inserts and Updates: Denormalization often leads to simpler write operations. If there will be a lot of changes or inserts, denormalization can be more efficient.
Atomicity: When data that belongs together is split into different collections, ensuring atomicity can become difficult.
A field is a single piece of data within a document. It's synonymous with a database column.
Field Type: MongoDB supports multiple field types, including arrays.
Limit on Nested Fields: Documents can be nested, which is like being able to have sub-documents within a main document. However, there is a depth limitation: you can't embed documents endlessly.
MongoDB is often regarded as schema-less, but a more accurate description is that it's flexible. While documents within a single collection can have different fields, a robust schema design process is still essential.
Adapting to Evolving Schemas:
Versioning: Managed schema changes and versioning in the application layer.
Schema Validation: Introduced in MongoDB 3.2, this feature allows for the application of structural rules to documents.
Education and Training: Properly educating developers on the use of a database can minimize potential misuse of its flexibility.
Use of Techniques to Ensure Data Integrity: Techniques such as double-entry bookkeeping can assure data accuracy, especially when dealing with multiple, occasionally outdated records.
Normalization: Seeks to reduce redundancy and improve data consistency.
Denormalization: Emphasizes performance gains. Redundancies are knowingly introduced for optimized and rapid reads.
Use Cases Dictate: Neither is definitively superior; their suitability depends on the specific use case.
The default port number for MongoDB is 27017. While it is possible to run multiple instances of MongoDB on the same machine, each instance must have its unique port number to ensure they don't conflict.
MongoDB ensures high availability and disaster recovery through a robust data architecture and a distributed system model. It integrates various mechanisms to maintain data integrity, uptime assurances, and data redundancy.
Replica Sets: These are clusters of MongoDB nodes that use automatic failover to maintain data consistency.
WiredTiger Storage Engine: It powers numerous features including data durability, in-memory storage, and compression.
Oplog: Short for "operations log", it records all write operations in an append-only manner.
Write Concerns: These are rules that determine the level of acknowledgment required for write operations.
Read Preferences: They define which nodes in a cluster can satisfy read operations.
Data Centers: Hardware resilience can be achieved by distributing nodes across multiple data centers.
Backups and Restores: MongoDB offers built-in mechanisms to backup and restore data, further aiding in disaster recovery.
Monitoring Tools: For performance tracking and potential issue detection.
Technology Agnostic: Can deploy on multi-cloud, hybrid and on-premises architectures.
Restore: Achieved through the backup of data when the config server is the only component that is active and accurate. This method doesn't consider data changes made after the backup was captured.
Oplog Replays: This involves using oplogs that track changes, ensuring that even after a cluster restart, any missed transactions are reinstated.
Snapshotting: It is a consistent snapshot of data across the nodes in the replica set.
Here is the Python code:
# Import the MongoClient class from pymongo.
from pymongo import MongoClient
# Establish connection to the MongoDB server using MongoClient.
client = MongoClient('mongodb://localhost:27017/')
# Assign the test database to a variable
db = client.test
# Assign the collection within the test database to a variable
collection = db.test_collection
# Insert a document into the collection and set the write concern to 'majority'
result = collection.insert_one({'test_key': 'test_value'}, write_concern={'w': 'majority'})
# Fetch the oplog entry associated with the insert operation.
oplog_cursor = db.local.oplog.rs.find({'ns': 'test.test_collection', 'op': 'i'})
# Access the result and compare the count to ensure the operation was recorded in the oplog.
operation_count = oplog_cursor.count()
Indexes are employed in MongoDB to optimize database queries by providing faster access to data. Without indexes, MongoDB performs full collection scans.
Batch Insert Operations on an Indexed Collection Describe any performance bottlenecks you anticipate.
Unique Explain in which situations it's beneficial to manage a unique index. Discard icon GEO Index Describe the purpose of this index type and the type of queries it can optimize.
Index Overuse: Too many indexes can degrade write performance.
Index Size: Larger indexes consume more RAM and might slow down read and write operations.
Index Inefficiency: Inaccurate or non-selective index usage can render them ineffective.
Write Penalties: Indexes incur an overhead during writes, impacting their efficiency in write-heavy systems.
Index Maintenance: Regular maintenance, like rebuilding or reorganizing indexes, is often necessary.
Workload Misalignment: An index might not be beneficial if it's not aligned with the actual query workload.
Make sure to keep the indexes required and remove any unnecessary ones.
The _id
Field in MongoDB serves as a primary key and provides several key functionalities:
Uniqueness Guarantee: Each document must have a unique _id
, which ensures data integrity.
Automatic Indexing: Automated indexing based on _id
enhances query efficiency.
Inherent Timestamp: The _id
can have an embedded timestamp, useful for time-based operations.
For instance, with an ObjectId, the first 8 characters represent a 4 byte timestamp:
$\text{timestamp} = \text{substr}(\text{ObjectId}, 0, 8)$
Concurrency Control: If multiple write operations with the same _id
occur simultaneously, MongoDB uses a technique called last-write wins to manage the conflict:
The document with the most recent _id
value, or timestamp if using an ObjectId, supersedes the others.
Modify and Return: When executing an operation to insert a new document or find & modify an existing one, you can request to return the modified document and its _id
.
_id
While MongoDB provides automatic ObjectId generation, documents can also use custom values.
Custom Representations: Unleash flexibility by using custom strings, numbers, or other valid BSON types for the _id
field.
Controlled Uniformity: Design your own _id
strategy to align with data, such as employing natural keys for documents originating from specific, external sources.
Migrate with Care: Once an application is live, altering the structure can be intricate. Transition plans are vital for a seamless shift.
Custom Indexing: Managing an index on a uniquely generated custom _id
turns the data into a compact, high-throughput structure.
_id
FieldThe choice between automatic ObjectId and custom _id
values links back to the intended data model, data access patterns, and specific domain requirements.
While using the automatic ObjectId brings about benefits like ease of use and embedded timestamp, custom _id
generation provides finer control and helps in scenarios where a specific data structure is favored or where external data sources need to be integrated.
The process for creating a new collection in MongoDB is simple and instantaneous.
Select the Database: Ensure you are connected to the intended database for the collection's creation. Switch to the desired database using use
in the mongo
shell or select the database programmatically in your driver's API.
Perform a Write Operation: The new collection is created the moment you execute a write operation such as insert
, update
, or save
.
Check Collection Existence (Optional): While not necessary for the creation process, you can verify the collection is created using the listCollections method.
To insert a document into a MongoDB collection, you can use the insertOne()
method, which accepts the document as an argument:
db.collectionName.insertOne({
key1: "value1",
key2: 2,
key3: [1, 2, 3],
key4: { nestedKey: "nestedValue" }
});
Alternatively, you can use the insertOne()
method, supply an array of documents with insertMany()
:
db.collectionName.insertMany([
{ key: "value1" },
{ key: "value2" }
]);
To read data from a MongoDB collection, you use the find
method with various options for querying and data manipulation.
find
but retrieves only the first matching document.$eq
, $gt
, $lt
, $in
, $nin
, etc.$and
, $or
, $not
, $nor
, etc.$exists
, $type
$regex
, $mod
, $text
$geoNear
, $geoWithin
, etc.MongoDB also provides the aggregation framework for complex operations, using a pipeline of various stages like match
, group
, sort
, limit
, etc.
Here is a Python code:
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
collection = db["mycollection"]
# Retrieve all documents
all_documents = collection.find()
# Alternatively, you can iterate through the cursor:
for doc in all_documents:
print(doc)
Here is a Python code:
# Let's say we have the following documents in the collection:
# [{
# "name": "John",
# "age": 30,
# "country": "USA"
# },
# {
# "name": "Jane",
# "age": 25,
# "country": "Canada"
# }]
# Retrieve documents where the name is "John"
john_doc = collection.find_one({"name": "John"})
print(john_doc) # Output: {"name": "John", "age": 30, "country": "USA"}
# Retrieve documents where age is greater than or equal to 25 and from country "USA"
filter_criteria = {"age": {"$gte": 25}, "country": "USA"}
docs_matching_criteria = collection.find(filter_criteria)
for doc in docs_matching_criteria:
print(doc)
# Output: {"name": "John", "age": 30, "country": "USA"}
Projection helps control the fields returned. It uses a dictionary where fields to include are marked with 1, and those to exclude with 0.
For instance, {"name": 1, "age": 1, "_id": 0}
only includes name
and age
while excluding _id
:
Here is a Python code:
# Retrieve the name and age fields, ignoring the _id field
docs_with_limited_fields = collection.find({}, {"name": 1, "age": 1, "_id": 0})
for doc in docs_with_limited_fields:
print(doc)
# Output: {"name": "John", "age": 30}
# {"name": "Jane", "age": 25}
sort
, skip
, and limit
help in reordering, pagination, and limiting the result size.
Here is a Python code:
# Sort all documents by age in descending order
documents_sorted_by_age = collection.find().sort("age", -1)
# Skip the first two documents and retrieve the rest
documents_after_skipping = collection.find().skip(2)
# Limit the number of documents returned to 3
limited_documents = collection.find().limit(3)
Here is a Python code:
# Suppose, the collection has a "country" field for each document
# Get a list of distinct countries
distinct_countries = collection.distinct("country")
print(distinct_countries) # Output: ["USA", "Canada"]
Indexes improve read performance. Ensure to use appropriate indexes for frequent and complex queries to speed up data retrieval. If the queries differ from the indexing pattern or if the collection is small, the gain from indexing might be insignificant, or it could even affect the write performance of the database. Choose an indexing strategy based on your data and usage patterns.
For example, if you frequently query documents based on their "country" field, consider creating an index on that field:
Here is a Python, PyMongo code:
collection.create_index("country")
This would make lookups based on the "country" field more efficient.
MongoDB offers several ways to update documents (equivalent to SQL's "rows"). Let’s look at the most common methods.
UPDATE
statement.$set
, $inc
, $push
, $unset
, and more. This resembles SQL's UPDATE
with selective column updates.Method: db.collectionName.updateOne()
Code:
db.collectionName.updateOne(
{"name": "John Doe"},
{$set: {"age": 30}}
);
Use-Case: When replacing an entire document isn't needed. For example, when changing a user's email address.
Method: db.collectionName.replaceOne()
Code:
db.collectionName.replaceOne(
{"name": "John Doe"},
{"name": "John Doe", "age": 30}
);
Use-Case: When an entire document needs updating or replacing, such as a product detail or a user’s information.
MongoDB offers several methods for deleting documents.
deleteOne(): Deletes the first matched document.
deleteMany(): Removes all matching documents.
remove(): Legacy function; use deleteOne()
or deleteMany()
instead.
For deleteOne()
, the syntax is:
For deleteMany()
, the syntax is:
Here is the MongoDB shell script:
// Connect to the database
use myDB;
// Delete a single document from 'myCollection'
db.myCollection.deleteOne({ name: "Document1" });
// Delete all documents from 'myCollection' with the condition 'age' greater than 25
db.myCollection.deleteMany({ age: { $gt: 25 } });