Introduction
Pros of MongoDB:
- Object-oriented format is faster and easier for developers. JSON files almost always can be mapped to application objects, there is no need to use ORM.
- Schema is easier to understand and maintain - there are no weird additional one-to-many or many-to-many relations.
- Schema is flexible. You can add unexpected new fields to documents as response to change of business requirements.
- Flexible schema allows to avoid DB schema migrations (like Flyway, Liquibase, etc)
- MongoDB also has schema validation, so you can enforce structure of documents where you want it to.
- MongoDB stores data in binary JSON - BSON. It is more flexible data format than usual JSON because it can contain many primitive types like
int
,float
,date
and so on.
Working With Document Data:
MongoDB provides various ways of working with the data:
- QueryAPI - ad hoc queries, indexing, and real-time aggregations.
- Aggregation Pipeline - multistep pipeline that transforms documents into aggregated result. Steps include filtering, mapping, grouping, sorting and other transformative actions.
- ACID transactions - Mongo can provide the same all-or-nothing guarantees as relational databases. Guarantees can be tuned, so developers can decide what kind of consistency or performance is required for application.
- Change Streams - Mongo can notify consuming applications about changes made to the documents.
Distributed Architecture: Scalable, Resilient, and Mission Critical
Replica Sets - feature for replicating data inside the cluster. There can be up to 50 replicas. In case of downtime of the prime replica, new replica will be elected. Election algorithm based on Raft consensus protocol. Election algorithm includes these steps:
- Identifying replicas with most recent changes applied.
- Heartbeat and connectivity status.
- User-defined priorities.
Write Concern - setting made to ensure suitable level of write resilience. By default operation will be considered successful only when majority of replicas return successful response. This setting can be tuned to either make changes even more strict, or to make them more flexible and improve speed of changes.
Replica sets also can provide hedged reads, when user read request handled by closest by ping node.
Scale Up, Out, and Across Storage Tiers
MongoDB provides ability to shard data across multiple nodes. Sharding is being performed basing on sharding key (similarly to Cassandra)
There are several sharding algorithms provided:
- Ranged Sharding - documents sharded by key. Documents with same (similar?) sharding key most likely will be placed on the same node.
- Hashed Sharding - documents sharded based on MD5 hash of sharding key value. This ensures uniform distribution of data across the nodes. This algorithm is well suited for consuming big streams of data.
- Zoned Sharding - allows flexible for developers sharding logic based on zones in cluster.
Also, MongoDB provides tiered scaling, when MongoDB Atlas Online Archive can move timed data to less expensive cold storage in cloud.
Privacy and Security
- Authentication - SCRAM-256 based authentication, support for LDAP, Windows Active Directory, Kerberos, x.509 certificates and AWS IAM.
- Authorisation - role based access control.
- Auditing - there is native audit log tool.
- Network Isolation - for users of Atlas. Cluster on cloud will be separated from other tenants.
- Encryption - data can be encrypted on all parts of workflow - on disks, on network etc. Also, there is a feature called "client side field encryption (FLE)". It allows to encrypt individual fields in a document.
MongoDB Tools
- MongoDB Community Server - free DB server.
- Compass - GUI for MongoDB