MongoDB Basics with IMDb Movies Dataset
Connecting to MongoDB Using Compass
- Please download Compass from the MongoDB Download Center. If you downloaded Compass before today, please make sure you are using version 1.8 or later and upgrade if necessary.
- Install Compass on your computer from the download.
-
Launch Compass.
When Compass opens you will see a page titled “Connect to Host”.
Content
The movies dataset includes 85,855 movies with attributes such as movie description, average rating, number of votes, genre, etc.
The rating dataset includes 85,855 rating details from a demographic perspective.
The names dataset includes 297,705 cast members with personal attributes such as birth details, death details, height, spouses, children, etc.
The title principals dataset includes 835,513 cast members roles in movies with attributes such as IMDb title id, IMDb name id, order of importance in the movie, role, and characters played.
Download the IMDB movies data (CSV format) from the following website: https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset
Commands
show dbs - show databases residing in a cluster.
use video - switching to another database (with name video).
show collections - command for displaying collections in a database.
db - refers to the current referred database.
db.movies.find().pretty() - seeing documents that are there in the movies collection of video database.
db.movies.find().count() - count the number of documents in the movies collection of video database.
The Atlas clusters we’ve looked at are replica sets. Replica sets are designed so that if the primary node goes down, one of the other nodes will step up to take its place so that clients can continue reading and writing data as if nothing had happened. The mongo shell is one such client.
To begin creating your Atlas Sandbox cluster, visit register for atlas and complete the account creation form you see on that page.
Connecting to our Atlas sandbox cluster from the Mongo shell
mongo "mongodb://sandbox-shard-00-00-2xiao.mongodb.net:27017,sandbox-shard-00-01-2xiao.mongodb.net:27017,sandbox-shard-00-02-2xiao.mongodb.net:27017/test?replicaSet=Sandbox-shard-0" --ssl --authenticationDatabase admin --username your-username --password your-password
Loading data (.js file) into your sandbox cluster
- cd M001 (in your home directory which contains the .js file)
-
Run the MongoDB shell:
mongo "mongodb://sandbox-shard-00-00-2xiao.mongodb.net:27017,sandbox-shard-00-01-2xiao.mongodb.net:27017,sandbox-shard-00-02-2xiao.mongodb.net:27017/video?replicaSet=Sandbox-shard-0" --ssl --authenticationDatabase admin --username your-username --password your-passwordvideois the db that you are loading your collections into. We can change to some other db as well. - Once connected to sandbox cluster run
load("loadMovieDataSet.js")
Connecting to our Atlas sandbox cluster from Compass
- Go to Compass -> Connect to host
-
In the cloud.mongodb.com
a.) Select the sandbox cluster
Sandbox-> SelectPrimarynode and copy the Urlsandbox-shard-00-00-2xiao.mongodb.netand paste in the hostname field of a compass.b.) In the authentication phase -> select
username/ password systemand provide theusername -> your-usernameandpassword -> your-passwordc.) Add Favorite name
M001 Sandbox. Click Connect
Inserting Documents from MongoDB Shell
use video
show collections
One document at a time
Insert a document without specifying _id (MongoDB generates it automatically):
db.moviesScratch.insertOne({
title: "Star Trek II: The Wrath of Khan",
year: 1982,
imdb: "tt0084726"
});
Insert a document with a custom _id:
db.moviesScratch.insertOne({
_id: "tt0084726",
title: "Star Trek II: The Wrath of Khan",
year: 1982,
imdb: "tt0084726"
});
Many documents at a time
Ordered Insert
By default, insertMany performs an ordered insert. If an error occurs, MongoDB will stop processing remaining documents:
/* Ordered insert - stops on first error */
db.moviesScratch.insertMany([
{
"_id": "tt0084726",
"title": "Star Trek II: The Wrath of Khan",
"year": 1982,
"type": "movie"
},
{
"_id": "tt0796366",
"title": "Star Trek",
"year": 2009,
"type": "movie"
},
{
"_id": "tt0084726", // Duplicate _id - will cause error
"title": "Star Trek II: The Wrath of Khan",
"year": 1982,
"type": "movie"
},
{
"_id": "tt1408101",
"title": "Star Trek Into Darkness",
"year": 2013,
"type": "movie"
},
{
"_id": "tt0117731",
"title": "Star Trek: First Contact",
"year": 1996,
"type": "movie"
}
]);
Unordered Insert
With unordered insert, MongoDB will continue processing remaining documents even if an error occurs:
/* Unordered insert - continues even on errors */
db.moviesScratch.insertMany([
{
"_id": "tt0084726",
"title": "Star Trek II: The Wrath of Khan",
"year": 1982,
"type": "movie"
},
{
"_id": "tt0796366",
"title": "Star Trek",
"year": 2009,
"type": "movie"
},
{
"_id": "tt0084726", // Duplicate _id - error, but continues
"title": "Star Trek II: The Wrath of Khan",
"year": 1982,
"type": "movie"
},
{
"_id": "tt1408101",
"title": "Star Trek Into Darkness",
"year": 2013,
"type": "movie"
},
{
"_id": "tt0117731",
"title": "Star Trek: First Contact",
"year": 1996,
"type": "movie"
}
], {
"ordered": false
});
Filtering with queries
In Compass
{$and: [{"awards.wins": 2}, {"awards.nominations": 2}]}
In MongoDB command shell
Count documents matching multiple conditions:
db.movieDetails.find({$and: [{"awards.wins": 2}, {"awards.nominations": 2}]}).count()
Find movies with exact cast array:
db.movies.find({cast: ["Jeff Bridges", "Tim Robbins"]}).pretty()
Count movies in the Family genre:
db.movieDetails.find({genres: 'Family'}).count()
Count movies where Western is the second genre:
db.movieDetails.find({"genres.1": 'Western'}).count()
Cursors
The find() method returns a cursor. A cursor is essentially a pointer to the current location in a result set. For queries that return more than just a few documents, MongoDB will return the results in batches to our client (Mongo Shell). We use the cursor in our client to iterate through the results.
Projections
Projections reduce the network overhead and processing requirements by limiting the fields that are returned in the resulting documents. In the query, projection is added as the second argument to find() method.
Example - Return only title field:
db.movies.find({genre: "Action, Adventure"}, {title: 1})
The _id field is returned by default. You can exclude it:
db.movies.find({genre: "Action, Adventure"}, {title: 1, _id: 0})
Exclude multiple fields:
db.movies.find(
{genre: "Action, Adventure"},
{viewerRating: 0, viewerVotes: 0, runtime: 0, _id: 0}
)
Updating documents
Update a single field with $set
db.movieDetails.updateOne({
title: "The Martian"
}, {
$set: {
poster: ""
}
});
Update nested documents
db.movieDetails.updateOne({
title: "The Martian"
}, {
$set: {
"awards": {
"wins": 8,
"nominations": 14,
"text": "Nominated for 3 Golden Globes. Another 8 wins and 14 nominations."
}
}
});
$set replaces the value of the field with the specified value.
The difference between updateOne and updateMany is that updateMany will make the same modification to all documents that match the filter.
Remove fields with $unset
db.movieDetails.updateMany({
rated: null
}, {
$unset: {
rated: ""
}
});
Upsert - Update or Insert
We can insert while doing the update step with the upsert option:
db.movieDetails.updateOne({
"imdb.id": detail.imdb.id
}, {
$set: detail
}, {
upsert: true
});
Upsert updates the documents matching the filter. If there are none, insert the update document as the new document in the collection.
Replace entire document
db.movieDetails.replaceOne({
"imdb.id": detail.imdb.id
},
detailDoc
);
Query Operators
Greater than ($gt)
db.movieDetails.find({runtime: {$gt: 90}})
With projection:
db.movieDetails.find({runtime: {$gt: 90}}, {_id: 0, title: 1, runtime: 1})
Range queries
/* Greater than 90 AND less than 120 */
db.movieDetails.find(
{runtime: {$gt: 90, $lt: 120}},
{_id: 0, title: 1, runtime: 1}
)
Greater/Less than or equal ($gte, $lte)
db.movieDetails.find(
{runtime: {$gte: 90, $lte: 120}},
{_id: 0, title: 1, runtime: 1}
)
Multiple conditions
db.movieDetails.find(
{runtime: {$gte: 180}, "tomato.meter": 100},
{_id: 0, title: 1, runtime: 1}
)
Not equal ($ne)
db.movieDetails.find(
{rated: {$ne: "UNRATED"}},
{_id: 0, title: 1, rated: 1}
)
In array ($in)
/* Find movies rated G or PG */
db.movieDetails.find(
{rated: {$in: ["G", "PG"]}},
{_id: 0, title: 1, rated: 1}
)
/* Multiple values with pretty output */
db.movieDetails.find(
{rated: {$in: ["G", "PG", "PG-13"]}},
{_id: 0, title: 1, rated: 1}
).pretty()
db.movieDetails.find(
{rated: {$in: ["R", "PG-13"]}},
{_id: 0, title: 1, rated: 1}
).pretty()
Comparison Operators
Complex query with multiple operators:
db.movies.find({
cast: {$in: ["Jack Nicholson", "John Huston"]},
viewerRating: {$gt: 7},
mpaaRating: "R"
}).count()
Note: The above post is in reference to material related to M001-Basics course offered by MongoDB University.