Educational repository demonstrating approaches for storing tree structures with NoSQL database MongoDB
Educational repository demonstrating approaches for storing tree structures with NoSQL database MongoDB
In a real life almost any project deals with the tree structures. Different kinds of taxonomies, site structures etc require modelling of hierarhy relations. In this article I will illustrate using five typical approaches plus one combination of operating with hierarchy data on example of the MongoDB database. Those approaches are:
Note: article is inspired by another article 'Model Tree Structures in MongoDB' by MongoDB, but does not copy it, but provides additional examples on typical operations with tree management. Please refer for 10gen article to get more solid understanding of the approach.
As a demo dataset I use some fake eshop goods taxonomy.
In a typical site scenario, we should be able
On each of the examples below we:
Please refer to image above for visual representation.
This is most commonly used approach. For each node we store (ID, ParentReference, Order).
Pretty simple, but changing the position of the node withing siblings will require additional calculations. You might want to set high numbers like item position * 10^6 for order in order to be able to set new node order as trunc (lower sibling order - higher sibling order)/2 - this will give you enough operations, until you will need to traverse whole the tree and set the order defaults to big numbers again.
var existingelemscount = db.categoriesPCO.find({parent:'Electronics'}).count(); var neworder = (existingelemscount+1)*10; db.categoriesPCO.insert({_id:'LG', parent:'Electronics', someadditionalattr:'test', order:neworder}) //{ "_id" : "LG", "parent" : "Electronics", "someadditionalattr" : "test", "order" : 40 }
existingelemscount = db.categoriesPCO.find({parent:'Cell_Phones_and_Smartphones'}).count(); neworder = (existingelemscount+1)*10; db.categoriesPCO.update({_id:'LG'},{$set:{parent:'Cell_Phones_and_Smartphones', order:neworder}}); //{ "_id" : "LG", "order" : 60, "parent" : "Cell_Phones_and_Smartphones", "someadditionalattr" : "test" }
db.categoriesPCO.remove({_id:'LG'});
db.categoriesPCO.find({$query:{parent:'Electronics'}, $orderby:{order:1}}) //{ "_id" : "Cameras_and_Photography", "parent" : "Electronics", "order" : 10 } //{ "_id" : "Shop_Top_Products", "parent" : "Electronics", "order" : 20 } //{ "_id" : "Cell_Phones_and_Accessories", "parent" : "Electronics", "order" : 30 }
Unfortunately, also involves recursive operation
var descendants=[] var stack=[]; var item = db.categoriesPCO.findOne({_id:"Cell_Phones_and_Accessories"}); stack.push(item); while (stack.length>0){ var currentnode = stack.pop(); var children = db.categoriesPCO.find({parent:currentnode._id}); while(true === children.hasNext()) { var child = children.next(); descendants.push(child._id); stack.push(child); } } descendants.join(",") //Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,Nokia,Samsung,Apple,HTC,Vyacheslav
Unfortunately involves recursive operations
var path=[] var item = db.categoriesPCO.findOne({_id:"Nokia"}) while (item.parent !== null) { item=db.categoriesPCO.findOne({_id:item.parent}); path.push(item._id); } path.reverse().join(' / '); //Electronics / Cell_Phones_and_Accessories / Cell_Phones_and_Smartphones
Recommended index is on fields parent and order
db.categoriesPCO.ensureIndex( { parent: 1, order:1 } )
For each node we store (ID, ChildReferences).
Please note, that in this case we do not need order field, because Childs collection already provides this information. Most of languages respect the array order. If this is not in case for your language, you might consider additional coding to preserve order, however this will make things more complicated.
db.categoriesCRO.insert({_id:'LG', childs:[]}); db.categoriesCRO.update({_id:'Electronics'},{ $addToSet:{childs:'LG'}}); //{ "_id" : "Electronics", "childs" : [ "Cameras_and_Photography", "Shop_Top_Products", "Cell_Phones_and_Accessories", "LG" ] }
rearranging order under the same parent
db.categoriesCRO.update({_id:'Electronics'},{$set:{"childs.1":'LG',"childs.3":'Shop_Top_Products'}}); //{ "_id" : "Electronics", "childs" : [ "Cameras_and_Photography", "LG", "Cell_Phones_and_Accessories", "Shop_Top_Products" ] }
moving the node
db.categoriesCRO.update({_id:'Cell_Phones_and_Smartphones'},{ $addToSet:{childs:'LG'}}); db.categoriesCRO.update({_id:'Electronics'},{$pull:{childs:'LG'}}); //{ "_id" : "Cell_Phones_and_Smartphones", "childs" : [ "Nokia", "Samsung", "Apple", "HTC", "Vyacheslav", "LG" ] }
db.categoriesCRO.update({_id:'Cell_Phones_and_Smartphones'},{$pull:{childs:'LG'}}) db.categoriesCRO.remove({_id:'LG'});
Note requires additional client side sorting by parent array sequence
var parent = db.categoriesCRO.findOne({_id:'Electronics'}) db.categoriesCRO.find({_id:{$in:parent.childs}})
Result set:
{ "_id" : "Cameras_and_Photography", "childs" : [ "Digital_Cameras", "Camcorders", "Lenses_and_Filters", "Tripods_and_supports", "Lighting_and_studio" ] } { "_id" : "Cell_Phones_and_Accessories", "childs" : [ "Cell_Phones_and_Smartphones", "Headsets", "Batteries", "Cables_And_Adapters" ] } { "_id" : "Shop_Top_Products", "childs" : [ "IPad", "IPhone", "IPod", "Blackberry" ] } //parent: { "_id" : "Electronics", "childs" : [ "Cameras_and_Photography", "Cell_Phones_and_Accessories", "Shop_Top_Products" ] }
As you see, we have ordered array childs, which can be used to sort the result set on a client
var descendants=[] var stack=[]; var item = db.categoriesCRO.findOne({_id:"Cell_Phones_and_Accessories"}); stack.push(item); while (stack.length>0){ var currentnode = stack.pop(); var children = db.categoriesCRO.find({_id:{$in:currentnode.childs}}); while(true === children.hasNext()) { var child = children.next(); descendants.push(child._id); if(child.childs.length>0){ stack.push(child); } } } //Batteries,Cables_And_Adapters,Cell_Phones_and_Smartphones,Headsets,Apple,HTC,Nokia,Samsung descendants.join(",")
var path=[] var item = db.categoriesCRO.findOne({_id:"Nokia"}) while ((item=db.categoriesCRO.findOne({childs:item._id}))) { path.push(item._id); } path.reverse().join(' / '); //Electronics / Cell_Phones_and_Accessories / Cell_Phones_and_Smartphones
Recommended index is putting index on childs:
db.categoriesCRO.ensureIndex( { childs: 1 } )
For each node we store (ID, ParentReference, AncestorReferences)
var ancestorpath = db.categoriesAAO.findOne({_id:'Electronics'}).ancestors; ancestorpath.push('Electronics') db.categoriesAAO.insert({_id:'LG', parent:'Electronics',ancestors:ancestorpath}); //{ "_id" : "LG", "parent" : "Electronics", "ancestors" : [ "Electronics" ] }
moving the node
ancestorpath = db.categoriesAAO.findOne({_id:'Cell_Phones_and_Smartphones'}).ancestors; ancestorpath.push('Cell_Phones_and_Smartphones') db.categoriesAAO.update({_id:'LG'},{$set:{parent:'Cell_Phones_and_Smartphones', ancestors:ancestorpath}}); //{ "_id" : "LG", "ancestors" : [ "Electronics", "Cell_Phones_and_Accessories", "Cell_Phones_and_Smartphones" ], "parent" : "Cell_Phones_and_Smartphones" }
db.categoriesAAO.remove({_id:'LG'});
Note unless you introduce the order field, it is impossible to get ordered list of node children. You should consider another approach if you need order.
db.categoriesAAO.find({$query:{parent:'Electronics'}})
there are two options to get all node descendants. One is classic through recursion:
var ancestors = db.categoriesAAO.find({ancestors:"Cell_Phones_and_Accessories"},{_id:1}); while(true === ancestors.hasNext()) { var elem = ancestors.next(); descendants.push(elem._id); } descendants.join(",") //Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,Nokia,Samsung,Apple,HTC,Vyacheslav
second is using aggregation framework introduced in MongoDB 2.2:
var aggrancestors = db.categoriesAAO.aggregate([ {$match:{ancestors:"Cell_Phones_and_Accessories"}}, {$project:{_id:1}}, {$group:{_id:{},ancestors:{$addToSet:"$_id"}}} ]) descendants = aggrancestors.result[0].ancestors descendants.join(",") //Vyacheslav,HTC,Samsung,Cables_And_Adapters,Batteries,Headsets,Apple,Nokia,Cell_Phones_and_Smartphones
For each node we store (ID, PathToNode)
var ancestorpath = db.categoriesMP.findOne({_id:'Electronics'}).path; ancestorpath += 'Electronics,' db.categoriesMP.insert({_id:'LG', path:ancestorpath}); //{ "_id" : "LG", "path" : "Electronics," }
moving the node
ancestorpath = db.categoriesMP.findOne({_id:'Cell_Phones_and_Smartphones'}).path; ancestorpath +='Cell_Phones_and_Smartphones,' db.categoriesMP.update({_id:'LG'},{$set:{path:ancestorpath}}); //{ "_id" : "LG", "path" : "Electronics,Cell_Phones_and_Accessories,Cell_Phones_and_Smartphones," }
db.categoriesMP.remove({_id:'LG'});
Note unless you introduce the order field, it is impossible to get ordered list of node children. You should consider another approach if you need order.
db.categoriesMP.find({$query:{path:'Electronics,'}}) //{ "_id" : "Cameras_and_Photography", "path" : "Electronics," } //{ "_id" : "Shop_Top_Products", "path" : "Electronics," } //{ "_id" : "Cell_Phones_and_Accessories", "path" : "Electronics," }
Single select, regexp starts with ^ which allows using the index for matching
var descendants=[] var item = db.categoriesMP.findOne({_id:"Cell_Phones_and_Accessories"}); var criteria = '^'+item.path+item._id+','; var children = db.categoriesMP.find({path: { $regex: criteria, $options: 'i' }}); while(true === children.hasNext()) { var child = children.next(); descendants.push(child._id); } descendants.join(",") //Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,Nokia,Samsung,Apple,HTC,Vyacheslav
We can obtain path directly from node without issuing additional selects.
var path=[] var item = db.categoriesMP.findOne({_id:"Nokia"}) print (item.path) //Electronics,Cell_Phones_and_Accessories,Cell_Phones_and_Smartphones,
Recommended index is putting index on path
db.categoriesAAO.ensureIndex( { path: 1 } )
For each node we store (ID, left, right).
Left field also can be treated as an order field
Please refer to image above. Assume, we want to insert LG node after shop_top_products(14,23). New node would have left value of 24, affecting all remaining left values according to traversal rules, and will have right value of 25, affecting all remaining right values including root one.
Steps:
var followingsibling = db.categoriesNSO.findOne({_id:"Cell_Phones_and_Accessories"}); var newnode = {_id:'LG', left:followingsibling.left,right:followingsibling.left+1} db.categoriesNSO.update({right:{$gt:followingsibling.right}},{$inc:{right:2}}, false, true) db.categoriesNSO.update({left:{$gte:followingsibling.left}, right:{$lte:followingsibling.right}},{$inc:{left:2, right:2}}, false, true) db.categoriesNSO.insert(newnode)
Let's check the result:
+-Electronics (1,46) +---Cameras_and_Photography (2,13) +------Digital_Cameras (3,4) +------Camcorders (5,6) +------Lenses_and_Filters (7,8) +------Tripods_and_supports (9,10) +------Lighting_and_studio (11,12) +----Shop_Top_Products (14,23) +------IPad (15,16) +------IPhone (17,18) +------IPod (19,20) +------Blackberry (21,22) +----LG (24,25) +----Cell_Phones_and_Accessories (26,45) +------Cell_Phones_and_Smartphones (27,38) +---------Nokia (28,29) +---------Samsung (30,31) +---------Apple (32,33) +---------HTC (34,35) +---------Vyacheslav (36,37) +-------Headsets (39,40) +-------Batteries (41,42) +-------Cables_And_Adapters (43,44)
While potentially rearranging node order within same parent is identical to exchanging node's left and right values, the formal way of moving the node is first removing node from the tree and later inserting it to new location. Node: node removal without removing it's childs is out of scope for this article. For now, we assume, that node to remove has no children, i.e. right-left=1
Steps are identical to adding the node - i.e. we adjusting the space by decreasing affected left/right values, and removing original node.
var nodetoremove = db.categoriesNSO.findOne({_id:"LG"}); if((nodetoremove.right-nodetoremove.left-1)>0.001) { print("Only node without childs can be removed") exit } var followingsibling = db.categoriesNSO.findOne({_id:"Cell_Phones_and_Accessories"}); //update all remaining nodes db.categoriesNSO.update({right:{$gt:nodetoremove.right}},{$inc:{right:-2}}, false, true) db.categoriesNSO.update({left:{$gt:nodetoremove.right}},{$inc:{left:-2}}, false, true) db.categoriesNSO.remove({_id:"LG"});
Let's check result:
+-Electronics (1,44) +--Cameras_and_Photography (2,13) +-----Digital_Cameras (3,4) +-----Camcorders (5,6) +-----Lenses_and_Filters (7,8) +-----Tripods_and_supports (9,10) +-----Lighting_and_studio (11,12) +---Shop_Top_Products (14,23) +-----IPad (15,16) +-----IPhone (17,18) +-----IPod (19,20) +-----Blackberry (21,22) +---Cell_Phones_and_Accessories (24,43) +-----Cell_Phones_and_Smartphones (25,36) +--------Nokia (26,27) +--------Samsung (28,29) +--------Apple (30,31) +--------HTC (32,33) +--------Vyacheslav (34,35) +------Headsets (37,38) +------Batteries (39,40) +------Cables_And_Adapters (41,42)
moving the node can be within same parent, or to another parent. If the same parent, and nodes are without childs, than you need just to exchange nodes (left,right) pairs.
Formal way is to remove node and insert to new destination, thus the same restriction apply - only node without children can be moved. If you need to move subtree, consider creating mirror of the existing parent under new location, and move nodes under the new parent one by one. Once all nodes moved, remove obsolete old parent.
As an example, lets move LG node from the insertion example under the Cell_Phones_and_Smartphones node, as a last sibling (i.e. you do not have following sibling node as in the insertion example)
Step 1 would be to remove LG node from tree using node removal procedure described above Step2 is to take right value of the new parent. New node will have left value of the parent's right value and right value - incremented by one parent's right one Now we have to create the place for the new node: update affects right values of all nodes on a further traversal path
var newparent = db.categoriesNSO.findOne({_id:"Cell_Phones_and_Smartphones"}); var nodetomove = {_id:'LG', left:newparent.right,right:newparent.right+1} //3th and 4th parameters: false stands for upsert=false and true stands for multi=true db.categoriesNSO.update({right:{$gte:newparent.right}},{$inc:{right:2}}, false, true) db.categoriesNSO.update({left:{$gte:newparent.right}},{$inc:{left:2}}, false, true) db.categoriesNSO.insert(nodetomove)
Let's check result:
+-Electronics (1,46) +--Cameras_and_Photography (2,13) +-----Digital_Cameras (3,4) +-----Camcorders (5,6) +-----Lenses_and_Filters (7,8) +-----Tripods_and_supports (9,10) +-----Lighting_and_studio (11,12) +---Shop_Top_Products (14,23) +-----IPad (15,16) +-----IPhone (17,18) +-----IPod (19,20) +-----Blackberry (21,22) +---Cell_Phones_and_Accessories (24,45) +-----Cell_Phones_and_Smartphones (25,38) +---------Nokia (26,27) +---------Samsung (28,29) +---------Apple (30,31) +---------HTC (32,33) +---------Vyacheslav (34,35) +---------LG (36,37) +-------Headsets (39,40) +-------Batteries (41,42) +-------Cables_And_Adapters (43,44)
Note, unless all node childs have no childrens theirselfs it is impossible to get node direct childs. Consider using modified approach of combining NestedSets with parent field.
This is core stength of this approach - all descendants retrieved using one select to DB. Moreover, by sorting by node left - the dataset is ready for traversal in a correct order
var descendants=[] var item = db.categoriesNSO.findOne({_id:"Cell_Phones_and_Accessories"}); print ('('+item.left+','+item.right+')') var children = db.categoriesNSO.find({left:{$gt:item.left}, right:{$lt:item.right}}).sort(left:1); while(true === children.hasNext()) { var child = children.next(); descendants.push(child._id); } descendants.join(",") //Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,Nokia,Samsung,Apple,HTC,Vyacheslav
Retrieving path to node is also elegant and can be done using single query to database
var path=[] var item = db.categoriesNSO.findOne({_id:"Nokia"}) var ancestors = db.categoriesNSO.find({left:{$lt:item.left}, right:{$gt:item.right}}).sort({left:1}) while(true === ancestors.hasNext()) { var child = ancestors.next(); path.push(child._id); } path.join('/') // Electronics/Cell_Phones_and_Accessories/Cell_Phones_and_Smartphones
Recommended index is putting index on left and right values:
db.categoriesAAO.ensureIndex( { left: 1, right:1 } )
For each node we store (ID, Parent, Order,left, right).
Left field also is treated as an order field, so we could omit order field. But from other hand we can leave it, so we can use Parent Reference with order data to reconstruct left/right values in case of accidental corruption, or, for example during initial import.
Adding new node can be adopted from Nested Sets in this manner:
var followingsibling = db.categoriesNSO.findOne({_id:"Cell_Phones_and_Accessories"}); var previoussignling = db.categoriesNSO.findOne({_id:"Shop_Top_Products"}); var neworder = parseInt((followingsibling.order + previoussignling.order)/2); var newnode = {_id:'LG', left:followingsibling.left,right:followingsibling.left+1, parent:followingsibling.parent, order:neworder}; db.categoriesNSO.update({right:{$gt:followingsibling.right}},{$inc:{right:2}}, false, true) db.categoriesNSO.update({left:{$gte:followingsibling.left}, right:{$lte:followingsibling.right}},{$inc:{left:2, right:2}}, false, true) db.categoriesNSO.insert(newnode)
Before insertion
+----Cameras_and_Photography (2,13) ord.[10] +-----Shop_Top_Products (14,23) ord.[20] +-----Cell_Phones_and_Accessories (26,45) ord.[30]
After insertion:
+--Electronics (1,46) +----Cameras_and_Photography (2,13) ord.[10] +-------Digital_Cameras (3,4) ord.[10] +-------Camcorders (5,6) ord.[20] +-------Lenses_and_Filters (7,8) ord.[30] +-------Tripods_and_supports (9,10) ord.[40] +-------Lighting_and_studio (11,12) ord.[50] +-----Shop_Top_Products (14,23) ord.[20] +-------IPad (15,16) ord.[10] +-------IPhone (17,18) ord.[20] +-------IPod (19,20) ord.[30] +-------Blackberry (21,22) ord.[40] +-----LG (24,25) ord.[25] +-----Cell_Phones_and_Accessories (26,45) ord.[30] +-------Cell_Phones_and_Smartphones (27,38) ord.[10] +----------Nokia (28,29) ord.[10] +----------Samsung (30,31) ord.[20] +----------Apple (32,33) ord.[30] +----------HTC (34,35) ord.[40] +----------Vyacheslav (36,37) ord.[50] +--------Headsets (39,40) ord.[20] +--------Batteries (41,42) ord.[30] +--------Cables_And_Adapters (43,44) ord.[40]
Identical to insertion approach
Approach from Nested Sets is used.
Now is possible by using (Parent,Order) pair
db.categoriesNSO.find({parent:"Electronics"}).sort({order:1}); /* { "_id" : "Cameras_and_Photography", "parent" : "Electronics", "order" : 10, "left" : 2, "right" : 13 } { "_id" : "Shop_Top_Products", "parent" : "Electronics", "order" : 20, "left" : 14, "right" : 23 } { "_id" : "LG", "left" : 24, "right" : 25, "parent" : "Electronics", "order" : 25 } { "_id" : "Cell_Phones_and_Accessories", "parent" : "Electronics", "order" : 30, "left" : 26, "right" : 45 } */
Approach from Nested Sets is used.
Approach from nested sets is used
Code can be downloaded from repository https://github.com/Voronenko/Storing_TreeView_Structures_WithMongoDB
All files are packaged according to the following naming convention:
Please note, that MongoDB does not provide ACID transactions. This means, that for update operations splitted into separate update commands, your application should implement additional code to support your code specific transactions.
Formal advise from 10gen is following:
You are free to mix patterns (by introducing order field, etc) to match the data operations required to your application.