`
leonzhx
  • 浏览: 768185 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Brief Introduction of MongoDB

 
阅读更多

1.  MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built for high read and write throughput and the ability to scale easily with automatic failover. Whether an application requires just one database node or dozens of them, MongoDB can provide surprisingly good performance.

 

2.  A document-based data model can represent rich, hierarchical data structures, it’s often possible to do without the complicated multi-table joins imposed by relational databases. When you open the MongoDB JavaScript shell, you can easily get a comprehensible representation of your product with all its information hierarchically organized in a JSON-like structure. With MongoDB, the object defined in the programming language can be persisted “as is,” removing some of the complexity of object mappers.

 

3.  MongoDB was originally developed by 10gen for a platform hosting web applications that, by definition, required its database to scale gracefully across multiple machines. Its design as a horizontally scalable primary data store sets it apart from other modern database systems.

 

4.  MongoDB’s data model is document-oriented. A document is essentially a set of property names and their values. The values can be simple data types, such as strings, numbers, and dates. But these values can also be arrays and even other documents. A document-oriented data model naturally represents data in an aggregate form, allowing you to work with an object holistically.

 

5.  Documents need not conform to a prespecified schema. MongoDB groups documents into collections, containers that don’t impose any sort of schema. In theory, each document in a collection can have a completely different structure; in practice, a collection’s documents will be relatively uniform.

 

6.  To say that a system supports ad hoc queries is to say that it’s not necessary to define in advance what sorts of queries the system will accept. Relational databases have this property; key-value store systems usually sacrifice rich query power in exchange for a simple scalability model. But MongoDB still support ad hoc queries.

 

7.  Proper indexes will increase query and sort speeds by orders of magnitude; consequently, any system that supports ad hoc queries should also support secondary indexes. Secondary indexes in MongoDB are implemented as B-trees. B-tree indexes, also the default for most relational databases, are optimized for a variety of queries, including range scans and queries with sort clauses. By permitting multiple secondary indexes, MongoDB allows users to optimize for a wide variety of queries. With MongoDB, you can create up to 64 indexes per collection.

 

8.  MongoDB provides database replication via a topology known as a replica set. Replica sets distribute data across machines for redundancy and automate failover in the event of server and network outages. Replica sets consist of exactly one primary node and one or more secondary nodes. A replica set’s primary node can accept both reads and writes, but the secondary nodes are read-only. If the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.

 

9.  In MongoDB, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. All writes, by default, are fire-and-forget, which means that these writes are sent across a TCP socket without requiring a database response. If users want a response, they can issue a write using a special safe mode provided by all drivers. This forces a response, ensuring that the write has been received by the server with no errors. Safe mode is configurable; it can also be used to block until a write has been replicated to some number of servers.

 

10.  In MongoDB v2.0, journaling is enabled by default. With journaling, every write is committed to an append-only log. If the server is ever shut down uncleanly (say, in a power outage), the journal will be used to ensure that MongoDB’s data files are restored to a consistent state when you restart the server.

 

11.  The technique of augmenting a single node’s hardware for scale is known as vertical scaling or scaling up. Vertical scaling has the advantages of being simple, reliable, and cost-effective up to a certain point. Instead of beefing up a single node, scaling horizontally (or scaling out) means distributing the database across multiple machines. The distribution of data across machines mitigates the consequences of failure.

 

12. MongoDB has been designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as auto-sharding, which automatically manages the distribution of data across nodes. The sharding system handles the addition of shard nodes, and it also facilitates automatic failover. Individual shards are made up of a replica set consisting of at least two nodes, ensuring automatic recovery with no single point of failure. All this means that no application code has to handle these logistics; your application code communicates with a sharded cluster just as it speaks to a single node.

 

13. MongoDB is written in C++. The project compiles on all major operating systems, including Mac OS X, Windows, and most flavors of Linux. MongoDB is open source and licensed under the GNU-AGPL. The source code is freely available on GitHub. The project is guided by the 10gen core server team.

 

14. The core database server runs via an executable called mongod (mongodb.exe on Windows). The mongod server process receives commands over a network socket using a custom binary protocol. All the data files for a mongod process are stored by default in /data/db(c:\data\db on Windows)

 

15. mongod can be run in several modes, the most common of which is as a member of a replica set. Replica set configurations generally consist of two replicas, plus an arbiter process residing on a third server. For MongoDB’s auto-sharding architecture, the components consist of mongod processes configured as per-shard replica sets, with special metadata servers, known as config servers, on the side. A separate routing server called mongos is also used to send requests to the appropriate shard.

 

16. The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. The mongo executable loads the shell and connects to a specified mongod process. Most commands are issued using JavaScript expressions. The shell also permits you to run administrative commands. Some examples include viewing the current database operation, checking the status of replication to a secondary node, and configuring a collection for sharding.

17. All documents require a primary key stored in the _id field. You’re allowed to enter a custom _id as long as you can guarantee its uniqueness. But if you omit the _id altogether, then a MongoDB object ID will be inserted automatically.

18. All of the MongoDB drivers implement similar methods for saving a document to a collection, but the representation of the document itself will usually be whatever is most natural to each language. In Java you represent documents using a special document builder class that implements LinkedHashMap.

19. Little abstraction beyond the driver itself is required to build an application. Many developers like using a thin wrapper(Morphia for Java) over the drivers to handle associations, validations, and type checking.

 

20. MongoDB is bundled with several command-line utilities:
  a)  mongodump and mongorestore—Standard utilities for backing up and restoring a database. mongodump saves the database’s data in its native BSON format; this tool has the advantage of being usable for hot backups which can easily be restored with mongorestore.
  b)  mongoexport and mongoimport—These utilities export and import JSON, CSV, and TSV data;
  c)  mongosniff—A wire-sniffing tool for viewing operations sent to the database. Essentially translates the BSON going over the wire to human-readable shell statements.
  d)  mongostat—Similar to iostat; constantly polls MongoDB and the system to provide helpful stats, including the number of operations per second, the amount of virtual memory allocated, and the number of connections to the server.d

  Please reference http://docs.mongodb.org/manual/reference/ for details.

21. MongoDB is well suited as a primary data store for web applications, for analytics and logging applications, and for any application requiring a medium-grade cache. In addition, because it easily stores schemaless data, MongoDB is also good for capturing data whose structure can’t be known in advance.

22. The best-known simple key-value store is memcached (pronounced mem-cash-dee). Memcached stores its data in memory only, so it trades persistence for speed. It’s also distributed; memcached nodes running across multiple servers can act as a single data store, eliminating the complexity of maintaining cache state across machines. Compared with MongoDB, a simple key-value store like memcached will often allow for faster reads and writes. But unlike MongoDB, these systems can rarely act as primary data stores. Simple key-value stores are best used as adjuncts, either as caching layers atop a more traditional database or as simple persistence layers for ephemeral services like job queues.

 

23. Sophisticated key-value stores manage a relatively self-contained domain that demands significant storage and availability. Because of their masterless architecture, these systems scale easily with the addition of nodes. They opt for eventual consistency, which means that reads don’t necessarily reflect the latest write. But what users get in exchange for weaker consistency is the ability to write in the face of any one node’s failure. This contrasts with MongoDB, which provides strong consistency, a single master (per shard), a richer data model, and secondary indexes.

 

24. MongoDB and RDMS are both capable of representing a rich data model, although where RDMS uses fixed-schema tables, MongoDB has schema-free documents. RDBMS and MongoDB both support B-tree indexes. RDMS supports both joins and transactions, so if you must use SQL or if you require transactions, then you’ll need to use RDBMS. That said, MongoDB’s document model is often rich enough to represent objects without requiring joins. And its updates can be applied atomically to individual documents, providing a subset of what’s possible with traditional transactions. Both MongoDB and RDBMS support replication. MongoDB has been designed to scale horizontally, with sharding and failover handled automatically. Any sharding on RDBMS has to be managed manually, and given the complexity involved, it’s more common to see a vertically scaled RDBMS system.

25. MongoDB works well for analytics and logging. MongoDB’s relevance to analytics derives from its speed and from two key features: targeted atomic updates and capped collections. Atomic updates let clients efficiently increment counters and push values onto arrays. Capped collections, often useful for logging, feature fixed allocation, which lets them age out automatically. Storing logging data in a database, as compared with the file system, provides easier organization and much greater query power. Now, instead of using grep or a custom log search utility, users can employ the MongoDB query language they know and love to examine log output.

26. MongoDB should usually be run on 64-bit machines. 32-bit systems are capable of addressing only 4 GB of memory. Acknowledging that typically half of this memory will be allocated by the operating system and program processes, this leaves just 2 GB of memory on which to map the data files. So if you’re running 32-bit, and if you have even a modest number of indexes defined, you’ll be strictly limited to as little as 1.5 GB of data.

 

27. A second consequence of using virtual memory mapping is that memory for the data will be allocated automatically, as needed. This makes it trickier to run the database in a shared environment. As with database servers in general, MongoDB is best run on a dedicated server.

 

28. It’s important to run MongoDB with replication, especially if you’re not running with journaling enabled. Because MongoDB uses memory-mapped files, any unclean shutdown of a mongod not running with journaling may result in corruption. Therefore, it’s necessary in this case to have a replicated backup available for failover.

 

Why MongoDB?

  • Document-oriented
    • Documents (objects) map nicely to programming language data types
    • Embedded documents and arrays reduce need for joins
    • Dynamically-typed (schemaless) for easy schema evolution
    • No joins and no multi-document transactions for high performance and easy scalability
  • High performance
    • No joins and embedding makes reads and writes fast
    • Indexes including indexing of keys from embedded documents and arrays
    • Optional streaming writes (no acknowledgements)
  • High availability
    • Replicated servers with automatic master failover
  • Easy scalability
    • Automatic sharding (auto-partitioning of data across servers)
      • Reads and writes are distributed over shards
      • No joins or multi-document transactions make distributed queries easy and fast
    • Eventually-consistent reads can be distributed over replicated servers
  • Rich query language

Large MongoDB deployment

1.       One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard.

2.       A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries.

3.       Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.

4.       One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard while consulting the config servers.

5.       One or more clients, each one is (part of) the user's application and issues commands to a router via the mongo client library (driver) for its language.

mongod is the server program (data or config). mongos is the router program.


Small deployment (no partitioning)

1.       One replica set (automatic failover), or one server with zero or more slaves (no automatic failover).

2.       One or more clients issuing commands to the replica set as a whole or the single master (the driver will manage which server in the replica set to send to).

Mongo data model

  • A Mongo system (see deployment above) holds a set of databases
  • A database holds a set of collections
  • A collection holds a set of documents
  • A document is a set of fields
  • A field is a key-value pair
  • A key is a name (string)
  • A value is a
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of values
  • 大小: 10.5 KB
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics