Create Categorical Buckets

Create categorical buckets from a continuous variable

Situation: You have a continuous variable like “age” and want to create categorical buckets (i.e., ages 10-20, 21-30, 31-40 etc)

Note: We can create minimum ages here

# creating age buckets
# group documents into those buckets
# summary statistics

db.contacts.aggregate([
    {$bucket: {groupBy: "$dob.age", boundaries: [18, 30, 40, 50, 60, 120], output: {
        numPersons: { $sum: 1},
        averageAge: {$avg: "$dob.age"}
    }}}
]).pretty()

Note: This version auto-generates buckets

// buckets auto
// break continuous variable into categorical

db.contacts.aggregate([
 {
     $bucketAuto: {
         groupBy: "$dob.age",
         buckets: 5,
         output: {
            numPersons: { $sum: 1},
            averageAge: {$avg: "$dob.age"}
         }
     }
 }
]).pretty()

For more content on data science, R, and Python find me on Twitter.

Previous
Next