`
C_J
  • 浏览: 124908 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Getting Start on Mongodb

阅读更多

题记:

    最近老和同学聊到non-relational-db的领域,今天恰巧看到robbin大哥对这个领域的见解,让我心情澎拜。

 

 

    WEB2.0的兴起暴露了关系型数据库的弊端,推动了非关系型数据库的发展。

    对于WEB应用,强调了高读写操作,海量数据存储,横向扩展,正如robbin大哥说的,关系型数据库的优点在WEB应用面前变得无用武之地:事务一致性、多表查询。

    解决高读写操作则牺牲一致性,内存操作,并异步flush到文件系统;

    解决海量数据则写自己的文件系统;

    解决横向扩展需要解决集群的可拔插。

 

各类non-relational数据库都有各自的特点,MongoDB能支撑海量数据,TC/TT提供很好的高并发读写性能,Cassandra适合集群(它更像一组网络服务尔非数据库)。

 

个人觉得海量数据和对用户提供的高并发必然集群,所以即便MongoDB很好支持了海量存储,但不知集群方面做的怎样,所以Cassandra是值得学习的。

   

关于NOSQL的产品之一——MongoDB

 

MongoDB—Kyle Banker—10gen

MongoDB is JSON document oriented database. These documents are stored in the database as BSON (binary JSON). BSON is efficient, fast, and is richer in type than JSON (i.e. regex support). Documents are grouped in collections which are analogous to relational tables, but are schema free.

GridFS is a specication for storing large binary files like images and videos in MongoDB. Every document has a 4MB limit. GridFS chuncs the large files into such 4MB parts inside a collection, with a saperate metadata collection. MusicNation.com stores all music and video alongside the application data in MongoDB (about 1TB).

MongoDB has its own wire protocol with socket drivers for several languages. The drivers serializes the data to BSON before transfer.

Replication is used for failover and redundancy. Most commonly a master-slave setup is used. It’s also possible to setup a replica pair architecture.

MongoDB provides a custom query language which should be as powerful as SQL. MongoDB understands the internal structures of its documents which enables dynamic queries. Map/reduce functions are also supported in the query language.

BusinessInsider.com has been using MongoDB for two years with 12M page views/month. They like the simplification of the data model. Posts for instance have embedded comments. They also store real-time analytics in MongoDB which enables fast inserts and eased data analysis with dymanic queries. Uses a single MongoDB database server, 3 Apache web servers, and Memcached caching only on the front page.

TweetCongress.org are users of MongoDB and likes that code defines the schema, and one can therefore version control the schema. They use a single master with snapshots on a 64-bit EC2 instance.

SourceForge.net had a large redesign this summer where they moved to MongoDB. Their goal was to store the front pages, project pages, and download pages in a single document. It’s deployed with one master and 5-6 read-only slaves (obviously scaled for reads and reliability).

 

 

Download

The easiest (and recommended) way to install MongoDB is to use the pre-built binaries.

32-bit binaries

Download and extract the 32-bit .zip. The "Production" build is recommended.

64-bit binaries

Download and extract the 64-bit .zip.

Note: 64-bit is recommended, although you must have a 64-bit version of Windows to run that version.

Unzip

Unzip the downloaded binary package to the location of your choice. You may want to rename mongo-xxxxxxx to just "mongo" for convenience.

Create a data directory

By default MongoDB will store data in C:\data\db, but it won't automatically create that folder, so we do so here:

C:\> mkdir \data
C:\> mkdir \data\db

Or you can do this from the Windows Explorer, of course.

Run and connect to the server

The important binaries for a first run are:

  • mongod.exe - the database server
  • mongo.exe - the administrative shell

To run the database, click mongod.exe in Explorer, or run it from a CMD window.

C:\> cd \my_mongo_dir\bin
C:\my_mongo_dir\bin > mongod
Getting A Database Connection

Let's now try manipulating the database with the database shell . (We could perform similar operations from any programming language using an appropriate driver.  The shell is convenient for interactive and administrative use.)

Start the MongoDB JavaScript shell with:

 

Connect to a database server running locally on the default port:

mongodb://localhost

Connect and login to the admin database as user "fred" with password "foobar":

mongodb://fred:foobar@localhost

Connect and login to the "baz" database as user "fred" with password "foobar":

mongodb://fred:foobar@localhost/baz

Connect to a replica pair, with one server on example1.com and another server on example2.com:

mongodb://example1.com:27017,example2.com:27017

Connect to a replica set with three servers running on localhost (on ports 27017, 27018, and 27019):

mongodb://localhost,localhost:27018,localhost:27019

 

 

 

"connecting to:" tells you the name of the database the shell is using. To switch databases, type:

> use mydb
switched to db mydb

To see a list of handy commands, type help.

Tip for Developers with Experience in Other Databases
You may notice, in the examples below, that we never create a database or collection. MongoDB does not require that you do so. As soon as you insert something, MongoDB creates the underlying collection and database. If you query a collection that does not exist, MongoDB treats it as an empty collection.

Switching to a database with the use command won't immediately create the database - the database is created lazily the first time data is inserted. This means that if you use a database for the first time it won't show up in the list provided by `show dbs` until data is inserted.

Each MongoDB server can support multiple databases. Each database is independent, and the data for each database is stored separately, for security and ease of management.

 

 

Using a Large Number of Collections 

 

Generally, having a large number of collections has no significant performance penalty, and results in very good performance.

Limits

By default MongoDB has a limit of approximately 24,000 namespaces per database.  Each collection counts as a namespace, as does each index.  Thus if every collection had one index, we can create up to 12,000 collections.  Use the --nssize parameter to set a higher limit.

Be aware that there is a certain minimum overhead per collection -- a few KB.  Further, any index will require at least 8KB of data space as the b-tree page size is 8KB.

--nssize

If more collections are required, run mongod with the --nssize parameter specified.  This will make the <database>.ns file larger and support more collections.  Note that --nssize sets the size used for newly created .ns files -- if you have an existing database and wish to resize, after running the db with --nssize, run the db.repairDatabase() command from the shell to adjust the size.

Maximum .ns file size is 2GB.

 

 

MongoDB (BSON) Data Types

Mongo uses special data types in addition to the basic JSON types of string, integer, boolean, double, null, array, and object. These types include date, object id, binary data, regular expression, and code. Each driver implements these types in language-specific ways, see your driver's documentation for details.

 

 

GridFS

<!-- Root decorator: this is a layer of abstraction that Confluence doesn't need. It will be removed eventually. -->

<!-- wiki content -->

GridFS is a specification for storing large files in MongoDB. All of the officially supported driver implement the GridFS spec.

 

 

JSON

For example the following "document" can be stored in Mongo DB:

{ author: 'joe',
  created : new Date('03/28/2009'),
  title : 'Yet another blog post',
  text : 'Here is the text...',
  tags : [ 'example', 'joe' ],
  comments : [ { author: 'jim', comment: 'I disagree' },
              { author: 'nancy', comment: 'Good post' }
  ]
}

This document is a blog post, so we can store in a "posts" collection using the shell:

> doc = { author : 'joe', created : new Date('03/28/2009'), ... }
> db.posts.insert(doc);

MongoDB understands the internals of BSON objects -- not only can it store them, it can query on internal fields and index keys based upon them.  For example the query

> db.posts.find( { "comments.author" : "jim" } )

is possible and means "find any blog post where at least one comment subjobject has author == 'jim'".

 

  

> j = { name : "mongo" };
{"name" : "mongo"}
> t = { x : 3 };
{ "x" : 3  }
> db.things.save(j);
> db.things.save(t);
> db.things.find();
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
>

 

 

 

Let's add some more records to this collection:

> for (var i = 1; i <= 20; i++) db.things.save({x : 4, j : i});
> db.things.find();
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
has more

Querying

<!-- Root decorator: this is a layer of abstraction that Confluence doesn't need. It will be removed eventually. -->
<!-- wiki content -->

One of MongoDB's best capabilities is its support for dynamic (ad hoc) queries. Systems that support dynamic queries don't require any special indexing to find data; users can find data using any criteria. For relational databases, dynamic queries are the norm. If you're moving to MongoDB from a relational databases, you'll find that many SQL queries translate easily to MongoDB's document-based query language.

return every document in the users collection. Our query would look like this:
  db.users.find({})

In this case, our selector is an empty document, which matches every document in the collection. Here's a more selective example:

  db.users.find({'last_name': 'Smith'})

Here our selector will match every document where the last_name attribute is 'Smith.'

Field Selection

In addition to the query expression, MongoDB queries can take some additional arguments. For example, it's possible to request only certain fields be returned. If we just wanted the social security numbers of users with the last name of 'Smith,' then from the shell we could issue this query:

  // retrieve ssn field for documents where last_name == 'Smith':
  db.users.find({last_name: 'Smith'}, {'ssn': 1});

  // retrieve all fields *except* the thumbnail field, for all documents:
  db.users.find({}, {thumbnail:0});

Note the _id field is always returned even when not explicitly requested.

Sorting

MongoDB queries can return sorted results. To return all documents and sort by last name in ascending order, we'd query like so:

  db.users.find({}).sort({last_name: 1});

Skip and Limit

MongoDB also supports skip and limit for easy paging. Here we skip the first 20 last names, and limit our result set to 10:

db.users.find().skip(20).limit(10);
db.users.find({}, {}, 10, 20); // same as above, but less clear

slaveOk

When querying a replica pair or replica set, drivers route their requests to the master mongod by default; to perform a query against an (arbitrarily-selected) slave, the query can be run with the slaveOk option. Here's how to do so in the shell:

db.getMongo().setSlaveOk(); // enable querying a slave
db.users.find(...)

Note: some language drivers permit specifying the slaveOk option on each find(), others make this a connection-wide setting. See your language's driver for details.

Cursors

Database queries, performed with the find() method, technically work by returning a cursor. Cursors are then used to iteratively retrieve all the documents returned by the query. For example, we can iterate over a cursor in the mongo shell like this:

> var cur = db.example.find();
> cur.forEach( function(x) { print(tojson(x))});
{"n" : 1 , "_id" : "497ce96f395f2f052a494fd4"}
{"n" : 2 , "_id" : "497ce971395f2f052a494fd5"}
{"n" : 3 , "_id" : "497ce973395f2f052a494fd6"}
>

Removing Objects from a Collection

To remove objects from a collection, use the remove() function in the mongo shell. (Other drivers offer a similar
function, but may call the function "delete". Please check your driver's documentation ).

remove() is like find() in that it takes a JSON-style query document as an argument to select which documents are removed. If you call remove() without a document argument, or with an empty document {}, it will remove all documents in the collection. Some examples :

db.things.remove({});    // removes all
db.things.remove({n:1}); // removes all where n == 1

If you have a document in memory and wish to delete it, the most efficient method is to specify the item's document _id value as a criteria:

db.things.remove({_id: myobject._id});

You may be tempted to simply pass the document you wish to delete as the selector, and this will work, but it's inefficient.

SELECT * FROM things WHERE name="mongo"
> db.things.find({name:"mongo"}).forEach(printjson);
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
SELECT * FROM things WHERE x=4
> db.things.find({x:4}).forEach(printjson);
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "x" : 4, "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "x" : 4, "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "x" : 4, "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "x" : 4, "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "x" : 4, "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "x" : 4, "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "x" : 4, "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "x" : 4, "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "x" : 4, "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "x" : 4, "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "x" : 4, "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "x" : 4, "j" : 20 }

The query expression is an document itself. A query document of the form { a:A, b:B, ... } means "where a==A and b==B and ...". More information on query capabilities may be found in the Queries and Cursors section of the Mongo Developers' Guide.

 

 

To illustrate, lets repeat the last example find({x:4}) with an additional argument that limits the returned document to just the "j" elements:

SELECT j FROM things WHERE x=4
> db.things.find({x:4}, {j:true}).forEach(printjson);
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "j" : 1 }
{ "_id" : ObjectId("4c220a42f3924d31102bd857"), "j" : 2 }
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "j" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd859"), "j" : 4 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85a"), "j" : 5 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85b"), "j" : 6 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85c"), "j" : 7 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85d"), "j" : 8 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85e"), "j" : 9 }
{ "_id" : ObjectId("4c220a42f3924d31102bd85f"), "j" : 10 }
{ "_id" : ObjectId("4c220a42f3924d31102bd860"), "j" : 11 }
{ "_id" : ObjectId("4c220a42f3924d31102bd861"), "j" : 12 }
{ "_id" : ObjectId("4c220a42f3924d31102bd862"), "j" : 13 }
{ "_id" : ObjectId("4c220a42f3924d31102bd863"), "j" : 14 }
{ "_id" : ObjectId("4c220a42f3924d31102bd864"), "j" : 15 }
{ "_id" : ObjectId("4c220a42f3924d31102bd865"), "j" : 16 }
{ "_id" : ObjectId("4c220a42f3924d31102bd866"), "j" : 17 }
{ "_id" : ObjectId("4c220a42f3924d31102bd867"), "j" : 18 }
{ "_id" : ObjectId("4c220a42f3924d31102bd868"), "j" : 19 }
{ "_id" : ObjectId("4c220a42f3924d31102bd869"), "j" : 20 }

Note that the "_id" field is always returned.

 

However, the findOne() method is both convenient and efficient:

> printjson(db.things.findOne({name:"mongo"}));
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }

This is more efficient because the client requests a single object from the database, so less work is done by the database and the network. This is the equivalent of find({name:"mongo"}).limit(1).

 

 

This is highly recommended for performance reasons, as it limits the work the database does, and limits the amount of data returned over the network. For example:

> db.things.find().limit(3);
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
{ "_id" : ObjectId("4c220a42f3924d31102bd856"), "x" : 4, "j" : 1 }
分享到:
评论

相关推荐

    ruby on rails对mongodb的操作

    ruby on rails对mongodb的操作ruby on rails对mongodb的操作ruby on rails对mongodb的操作ruby on rails对mongodb的操作

    MongoDB on Kubernetes技术解决方案.pptx

    MongoDB on Kubernetes技术解决方案.pptx

    Mastering MongoDB 3.x

    The book is based on MongoDB 3.x and covers topics ranging from database querying using the shell, built in drivers, and popular ODM mappers to more advanced topics such as sharding, high ...

    MongoDB笔记.docx

    一、MongoDB简介 3 二、MongoDB结构 3 二、MongoDB 数据库关系型(这里并不是值关系型数据库的关系) 3 1、MongoDB一对一关系型 3 2、MongoDB一对多关系型 4 3、MongoDB多对多关系型 4 三、创建数据库(mongodb_test...

    Linux安装mongodb客户端

    sudo vim /etc/yum.repos.d/mongodb-org-4.2.repo 写入: [mongodb-org-4.2] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/x86_64/ gpgcheck=1 enabled=1 gpg...

    如何安装MongoDB 如何使用MongoDB

    本课程是一套关于MongoDB应用开发的实战性教程,名为《深入浅出MongoDB应用实战开发(基础、开发指南、系统管理、集群及系统架构)》,教程侧重于讲解MongoDB的常用特性及高级特性,从实际开发的角度出发对MongoDB...

    MongoDB应用设计模式

    资源名称:MongoDB应用设计模式内容简介:无论是在构建社交媒体网站,还是在开发一个仅在内部使用的企业应用程序,《MongoDB应用设计模式》展示了MongoDB需要解决的商业问题之间的连接。你将学到如何把MongoDB设计...

    Big.Data.NoSQL.Architecting.MongoDB.epub

    Getting Started with Installation and coding with Mongo Shell 5: MongoDB Explained 6: Administering MongoDB 7: MongoDB Use Cases and How To’s – Know How To’s for using MongoDB to its best. ...

    五、MongoDB 学习PPT

    MongoDB 学习PPT

    MongoDB图形化管理工具 MongoDB Compass

    MongoDB图形化管理工具 MongoDB Compass

    【BAT必备】MongoDB面试题

    【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT必备】MongoDB面试题【BAT...

    MongoDB教程基础入门

    教程名称:MongoDB教程基础入门 课程目录:【】MongoDB教程基础入门-代码【】MongoDB教程基础入门01第一讲上【】MongoDB教程基础入门02第一讲下【】MongoDB教程基础入门03第二讲上【】MongoDB教程基础入门04第二讲...

    mongodb安装包

    mongodb,下载mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,mongodb,

    Practical.MongoDB.Architecting.Developing.and.Administering.MongoDB

    With so many companies opting for MongoDB as their NoSQL database of choice, there's a need for a practical how-to combined with expert advice for getting the most out of the software. Beginning ...

    mongodb-linux-x86_64-4.0.18.tgz

    mv mongodb-linux-x86_64-4.0.18 mongodb 3、进入 mongodb 目录创建目录 db 和 logs cd /usr/local/mongodb mkdir db mkdir logs 4、进入到 bin 目录下,编辑 mongodb.conf 文件,内容如下: dbpath=/usr/local/...

    Web.Development.with.MongoDB.and.NodeJS.2nd.Edition.178528752

    A practical guide with clear instructions to design and develop a complete web application from start to finish Who This Book Is For This book is designed for JavaScript developers of any skill level ...

    MongoDB.Data.Modeling.1782175342

    Focus on data usage and better design schemas with the help of MongoDB About This Book Create reliable, scalable data models with MongoDB Optimize the schema design process to support applications of...

    MongoDB 5.0.6 windows版本

    MongoDB 5.0.6 windows版本

    MongoDB4.2分片及副本集群搭建

    MongoDB4.2分片及副本集群搭建 MongoDB集群 MongoDB分片 MongoDB副本 MongoDB副本集群

    基于MongoDB的日志系统Mongodb-Log.zip

    mongodb-log 是一个基于MongoDB的Python日志系统。 MongoDB 的 Capped Collection是一个天生的日志系统,MongoDB自己的oplog就是用它来存储的,Capped Collection的特点是可以指定Collection的大小,当记录总大小...

Global site tag (gtag.js) - Google Analytics