Exploring Mongo's Aggregation Pipeline
Pipelines in Mongo
Previously, we looked at a “pipeline” $lookup
-> $unwind
-> $project
.
Use $lookup
to create the join. Then $unwind
to focus on the “result” of the $lookup
(“as:…"). Finally, $project
to specify the specific key-value pairs you want.
Note: In Mongo Compass Charts, you can re-use either queries
{}
or aggregation
[]
entered into the query field
Here’s an example to join the bounties
and customers
collection from the Bounty Board project:
db.bounties.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "customer_id",
as: "customerName",
},
},
{
$unwind: "$customerName",
},
{
$project: {
_id: 0,
title: 1,
"customerName.customerName": 1,
},
},
]);
Aggregation Framework
Let’s breakdown aggregation pipelines further. Here are some variations:
- match-group-sort
- match-sortByCount
- match-sort-project
- match-project-group-sort
- match-project-group
- project-sort
- project-group
- (multiple $project) to project in a specific order
This shows the flexibility of Aggregation pipelines and different ways to querying and displaying data.
Here are codes associated with the above examples:
- match-group-sort
# seasonal filter
# group by Bounty Creator, number of bounties per creator
# average reward posted
# sort by total bounties per person
db.bounties
.aggregate([
{ $match: { season: 2 } },
{
$group: {
_id: { creator_name: "$createdBy.discordHandle" },
num_bounties_created: { $sum: 1 },
avg_reward_amt: { $avg: "$reward.amount" },
},
},
{ $sort: { num_bounties_created: -1 } },
])
.pretty();
- match-sortByCount
# short cut to grouping by Bounty Creator
# can apply to Claimers and Submitters
db.bounties.aggregate([
{ $match: { season: 2 } },
{ $sortByCount: "$createdBy.discordHandle" },
]);
- match-sort-project
# seasonal filter
# sort by createdAt date, starting with most recent
# project/display only title and createdAt date
# note: can use '$toDate' instead of '$convert, input, to'
db.bounties.aggregate([
{ $match: { season: 2 } },
{ $sort: { createdAt: -1 } },
{
$project: {
_id: 0,
title: 1,
createdAt: { $convert: { input: "$createdAt", to: "date" } },
},
},
]).pretty();
- match-project-group-sort
# seasonal filter
# display avg_reward bountied per week
# group by week
# sort by week, by week in descending order
db.bounties.aggregate([
{ $match: { season: 1 } },
{
$project: {
_id: 0,
season: 1,
"reward.amount": 1,
createdAt: { $toDate: "$createdAt" },
},
},
{
$group: {
_id: { week: { $isoWeek: "$createdAt" } },
avg_reward: { $avg: "$reward.amount" },
},
},
{ $sort: { week: -1 } },
]);
- match-project-group
Here you can filter by multiple conditions, then project the columns you want to see and then group to get a summation.
This was useful to tally total values in Bounties given and/or claimed.
db.bounties.aggregate([
{$match: {"$and": [{season: 2}, {customer_id: '905250069463326740'}, {status: "Completed"}]}},
{$project: {_id: 0, customer_id: 1, "reward.amount": 1, "reward.currency": 1, status: 1}},
{$group: {_id: "$customer_id", sum: {$sum: "$reward.amount"}}}
])
- project-sort
# display subset of fields in bounties
# then, sort by reward amount
db.bounties.aggregate([
{
$project: {
_id: 0,
season: 1,
title: 1,
"reward.amount": 1,
"reward.currency": 1,
},
},
{ $sort: { "reward.amount": -1 } },
]);
- project-group
# group bounty by Season,
# average reward committed
db.bounties.aggregate([
{ $project: { _id: 0, season: 1, "reward.amount": 1 } },
{
$group: {
_id: { bounty_season: "$season" },
avg_reward: { $avg: "$reward.amount" },
},
},
]);
- (multiple $project) to project in a specific order
Here I used $project
twice consecutively. First time to put the key-value fields in a specific order (i.e., createdAt, then claimedAt, then submittedAt, then reviewedAt). Second time to get the original field names.
db.bounties.aggregate([
{$match: {"$and": [{season: 2}, {customer_id: '905250069463326740'}]}},
{$project: {_id: 0, _customer_id: "$customer_id", _title: "$title", _status: "$status", _createdAt: "$createdAt", _claimedAt: "$claimedAt", _submittedAt: "$submittedAt", _reviewedAt: "$reviewedAt"}},
{$project: {_id: 0, customer_id: "$_customer_id", title: "$_title", status: "$_status", createdAt: "$_createdAt", claimedAt: "$_claimedAt", submittedAt: "$_submittedAt", reviewedAt: "$_reviewedAt"}}
])
For more content on data science, R, and Python find me on Twitter.