MongoDB - Tips Beyond ORMs

For those who only have a hammer as a tool, all problems look like nails. - Mark Twain.

Using the tool properly

I don't see any problem in using an ORM with MongoDB, in fact I always use it, but the problem I see in internet tutorials and in places I've worked is that a lot of people take Mongoose (for example) and just give a .find({ ... }) or .findOne({ ... }) to bring a truckload of data to do processing that the database itself could perform. The ORMs themselves port native MongoDB features like the aggregation pipeline so you don't have to keep switching between ORM and native driver.

The goal

My goal with this article is to show you tips on how to do some operations on MongoDB more efficiently.

Setup

Let's enter some data so we can have context for the following examples. In this case we will use a list of courses. The version used of MongoDB is 4.4.6.

// Insere alguns documentos usando o mongosh
db.courses.insertMany([
  {
    sku: "UD001",
    title: "Understanding TypeScript",
    description: "Não limite o uso de TypeScript para Angular! Aprenda essa linguagem, seus recursos, fluxos de trabalho e como usá-la!",
    tags: [
        "javascript", "typescript", "nodejs"
    ],
    reviews: [
        {
          id: "60be9aae438bfc22d2078ff7",
          rate: 4
        }, {
          id: "60be9cff438bfc22d2079000",
          rate: 4
        },
        {

          id: "60be9cfe438bfc22d2078ffe",
          rate: 4
        },
        {

          id: "60bea1d0438bfc22d2079001",
          rate: 4
        }
    ],
    enrollment: 4786,
    author: "Leonardo Moura Leitao",
    incentives: {
        certificate: false,
        lifetime_access: true
    },
    created_at: new Date('2020-01-01T08:00:00Z'),
    updated_at: new Date('2021-01-01T08:00:00Z')
  },
  {
    sku: "UD002",
    title: "NodeJs, Typescript, TDD, DDD, Clean Architecture e SOLID",
    description: "Aprenda de verdade a criar uma API completa com Typescript utilizando TDD, Clean Architecture, Design Patterns e SOLID",
    tags: [
        "nodejs", "typescript", "tdd", "ddd", "solid", "clean-arquiteture"
    ],
    reviews: [
      {
        id: "60be9aae438bfc22d2078ff7",
        rate: 5
      }, {
        id: "60be9aaf438bfc22d2078ff9",
        rate: 4
      },
      {
        id: "60be9ab0438bfc22d2078ffa",
        rate: 4
      }
    ],
    enrollment: 2352,
    author: "Rodrigo Manguinho",
    incentives: {
      certificate: true,
      lifetime_access: true
    },
    created_at: new Date('2020-01-01T08:00:00Z'),
    updated_at: new Date('2021-01-01T08:00:00Z')
    }
]);

Those with an SQL background are probably more used to selecting which fields to return from a query, but those who use an ORM are generally not aware of this. Whenever possible, select the fields that will be returned in your query, making queries returning unnecessary data and impacts the performance, but increase the data traffic between the database and the client.


db.courses.find({}, { sku: 1, title: 1 });

/**
result...
[
  {
    "_id": {
      "$oid": "60bea27200f45d28a43ab0f0"
    },
    "sku": "UD001",
    "title": "Understanding TypeScript"
  },
  {
    "_id": {
      "$oid": "60bea27200f45d28a43ab0f1"
    },
    "sku": "UD002",
    "title": "NodeJs, Typescript, TDD, DDD, Clean Architecture e SOLID"
  }
]
*/

It is even possible to remove the _id generated by MongoDB.


db.courses.find({}, { _id:0, sku: 1, title: 1 });

/**
result...
[
  {
    "sku": "UD001",
    "title": "Entendendo TypeScript"
  },
  {
    "sku": "UD002",
    "title": "NodeJs, Typescript, TDD, DDD, Clean Architecture e SOLID"
  }
]
*/

Let's imagine that our system requirement is to return the course with the following data:

sku
title
review_count

If the reviews were in another collection and we had to do a $lookup to cross the courses with the reviews , it would be much more complicated and would consume a lot more resources, but luckily our structure is perfect... or is it not? Let's see the following query:


var courses = db.courses.find({});

courses.map(function(course) { 
  return {
    title: course.title,
    sku: course.sku,
    review_count: course.reviews.length,
  }; 
});

/**
result...
[
  {
    "title": "Understanding TypeScript",
    "sku": "UD001",
    "review_count": 4
  },
  {
    "title": "NodeJs, Typescript, TDD, DDD, Clean Architecture e SOLID",
    "sku": "UD002",
    "review_count": 3
  }
]
*/

The problem with this approach is that if the .map is executed on the client, a lot of data would have to travel over the network to do unnecessary processing, since there are optimized ways to perform the same task. Be aware that MongoDB supports up to 100,000 records in an array and you reviews list can't exceed this limit. In addition to that, the total document size cannot exceed 16 mb.

A better way to query is using aggregate:


db.courses.aggregate([
  {
    $project: {
      _id: 0,
      title: 1, 
      sku: 1,
      review_count: {
        $size: "$reviews"
      }
    }
  }
]);

Considering that this query is called several times, always with the intent of showing the number of reviews, we can improve our data and reduce computational effort, leaving the pre-calculated value.


db.courses.updateMany({ },[
  {
    $addFields: {
      review_count: {
        $size: "$reviews"
      }
    }
  }
]);

Now we can simply use:


db.courses.aggregate([
  {
    $project: {
      _id: 0,
      title: 1, 
      sku: 1,
      review_count: 1
    }
  }
]);

// OR

db.courses.find({}, {
  _id: 0,
  title: 1, 
  sku: 1,
  review_count: 1
});

The solution above is very good for doing a search, but what if you need to include a new review? One of the possibilities is to do as follows:


db.courses.updateOne({ sku: "UD001" }, { 
  // Adiciona mais um elemento em um campo tipo array
  $push: {
    reviews: {
      id: "60beba7d438bfc22d2079002",
      rate: 3
    }
  },
  // Incrementa a quantidade de reviews
  $inc: { review_count: 1 }
});

To remove a review we can use the code below, but an unintentional bug will be introduced.


db.courses.updateOne({ sku: "UD001" }, { 
  // Remove o elemento do array
  $pull: {
    reviews: {
      id: "60beba7d438bfc22d2079002",
      rate: 3
    }
  },
  // Decrementa a quantidade de reviews
  $inc: { review_count: -1 }
});

Running this code snippet once will generate the expected result, but if it runs again, the item cannot be removed as it no longer exists in reviews, but the decrement in review_count will occur and have a different value than size of the reviews array. Using the review rate to search is also not interesting, so we will do it in a way that you can trust the change and the final result.

// Query for a specific document with a review id and remove it.
db.courses.updateOne({ sku: "UD001", "reviews.id": "60be9cfe438bfc22d2078ffe" }, { 
  // Remove o elemento do array
  $pull: {
    reviews: {
      id: "60be9cfe438bfc22d2078ffe"
    }
  },
  // Decrease the review count.
  $inc: { review_count: -1 }
});

The review_count is working as expected, but what if the requirements change and we need to add the average review grades? How to keep consistency if the average needs the amount of reviews? how to know this value when updating? Well, we have a letter up our sleeve !

db.courses.updateMany({ sku: "UD001" },[
  {

    $addFields: {
      reviews: {
        $concatArrays: [ "$reviews", [{
          id: "60beba7d438bfc22d2079002",
          rate: 3
        }] ]
    }
    }
  },
  {

    $addFields: {
      review_count: {
        $size: "$reviews"
      },
      review_rating: {
        $avg: "$reviews.rate"
      }
    }
  }
]);

To update documents that don't have review_rating:


db.courses.updateMany({ review_rating: { $exists: false } },[
  {

    $addFields: {
      review_count: {
        $size: "$reviews"
      },
      review_rating: {
        $avg: "$reviews.rate"
      }
    }
  }
]);

If you check the result you can see that review_rating of the document with sku UD002 has the value 4.333333333333333, to have the result stored with a single decimal place we have to update our code to:


db.courses.updateMany({},[
  {
    $addFields: {
      review_count: {
        $size: "$reviews"
      },
      review_rating: {
        $round: [
          {
            $avg: "$reviews.rate"
          },
          1
        ]
      }
    }
  }
]);

Now let's simulate that one of the users changed the rating that was given in a review, but first I want you to take a second to think about how to make this change...

It's not always necessary to be a performance freak and try to optimize every bit of your code to be perfect, but there is a practice that is abused and even to make a document inconsistent, which is to recover the entire document, change the property and save the entire document again, for example:

// Mongoose example.
const course = await CourseModel.findOne({ 
    sku: "UD001"
}).exec();

for (const review of course.reviews) {
    if(review.id === "60be9cff438bfc22d2079000"){
        review.rate = 5;
    }
}

course.save();

Imagine if reviews was an array of 1000 or 10000 positions! 🤯 MongoDB is an enterprise level database and the problem is not an array having 10000 positions, the problem is a poorly designed "schema" and queries being done in arbitrary ways. Let's solve the same problem, but in a much better way.

// Only updated the rate value
db.courses.updateOne({ sku: "UD001", "reviews.id": "60be9cff438bfc22d2079000" },
    { $set: { "reviews.$.rate": 5 } }
);

// Updated the rate value and recalculate the review rating
db.courses.updateOne({ sku: "UD001", "reviews.id": "60be9cff438bfc22d2079000" },
    [
        { 
          $set: { 
            reviews:{
               "$map":{
                  "input":"$reviews",
                  as:"review",
                  in:{
                     "$mergeObjects":[
                        "$$review",
                        {
                           "$cond":[
                              {
                                 "$eq":[
                                    "$$review.id",
                                    "60be9cff438bfc22d2079000"
                                 ]
                              },
                              {
                                 "rate": 5
                              },
                              {

                              }
                           ]
                        }
                     ]
                  }
               }
            }
           }
         },
        { 
           $set: { "review_rating": {
            $round: [
          {
            $avg: "$reviews.rate"
          },
          1
        ]
        } } }
    ]
);

Now, the pipeline is much more complicated, it may have a less verbose way of doing it, but at least we are guaranteeing consistency in our update.

Despite being schemaless, MongoDB allows a very powerful schema validations but for now I'll leave an example just to wrap up the context that we are working on.


db.runCommand( {
   collMod: "courses",
   validator: { 
      $expr: { 
         $and:[
            { $eq: [{ $size: "$reviews"}, "$review_count"]},
            { $eq: [{
            $round: [
                  {
                     $avg: "$reviews.rate"
                  },
                  1
               ]

            }, "$review_rating"]}
         ]
      },
      $jsonSchema: {
      bsonType: "object",
      required: [ "review_rating", "review_count", "reviews" ],
      properties: {
         reviews: {
            bsonType: "array",
            description: "reviews deve ser um array"
         },
      }
   } },
   validationLevel: "moderate"
} );

Using schema validation we achieved the flexibility of a schemaless database that mongo can provide while ensuring the consistency of the stored data.

This was my first post and I already thank you for reading it to the end!

Photo by Jiawen Liu on Unsplash

MongoDB - Tips Beyond ORMs

Using the tool properly

The goal

Setup

Comments (3)

More from this blog

The Developer Saga: Postgres Part #1 - Database Access

MongoDB - Dicas além dos ORMs PT-BR

Command Palette

Using the tool properly

The goal

Setup

Comments (3)

More from this blog