`

Schemaless

 
阅读更多
“Schemaless”

In the NoSQL world it is common to talk about schemaless databases or data models.

It would be more precise to say “dynamic schema”.  In MongoDB, there are databases; a system catalog of collections; documents within collections; explicitly declared indexes for a collection.  The big difference is that “columns”, or rather fields in the document data model, are not predeclared.  Each field/value in the document is dynamic and can be present or missing.  Each value has a datatype too, so it isn’t typeless but rather dynamic or what some might call duck typing.

Here’s an example in the mongo shell.  We may have a couple docs:

> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }

We could then add a new person with an extra attribute:

> db.persons.insert({name:’julie’,age:28,likes:’baseball’})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “ben”, “age” : 30 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }

No “alter table” necessary.  This is very helpful with agile development methodologies. 

We can take it a step further however.  The value of a field need not be consistent from document to document.  Now, in practice, it is very very common for the contents of a collection to be homogeneous.  But we have the option.  For example suppose we want to add “likes” for ben, but ben likes a couple things.  What to do?

> db.persons.update({name:’ben’},{$set:{likes:[‘math’,’baseball’]}})
> db.persons.find()
{ “name” : “jane”, “age” : 25 }
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

In this example, things work out particularly elegantly as even though one likes value is an array, and the other a string, we can still do some queries across them that are interesting.  This is because when querying for a value, if the value is an array, MongoDB looks into the array:

> db.persons.find({likes:’baseball’})
{ “name” : “julie”, “age” : 28, “likes” : “baseball” }
{ “name” : “ben”, “age” : 30, “likes” : [ “math”, “baseball” ] }

Likewise we can index the field:

> db.persons.ensureIndex( { likes : 1 } )

All very handy and useful.  But you might ask “won’t my data get rather dirty with no schema constraints?”  I had this concern when we started; I assumed we would just add some constraint rules later when needed.  Oddly, there hasn’t been a lot of demand for the feature, so far.  Empirically, it seems the data doesn’t get too noisy.

One other very important note: the dynamic schema is not just for developer friendliness!  There is another good reason for it.  Imagine changing the schema in a database cluster involving 2,000 servers.  It might be tricky to change that global state globally in a consistent manner.  One goal here is to store very big data sets.  Alter table is probably not going to fly with billions or trillions of documents.

P.S. For compactness, the examples above do not show the _id field MongoDB or its driver automically adds to all documents.

P.P.S. Dynamic schema is not unique to MongoDB — some other products in the space do it too…of course I’m biased this is my favorite.

分享到:
评论

相关推荐

    schemaless的类sql分布式查询系统

    schemaless的类sql分布式查询系统 schemaless的类sql分布式查询系统 schemaless的类sql分布式查询系统 schemaless的类sql分布式查询系统 schemaless的类sql分布式查询系统 schemaless的类sql分布式查询系统 ...

    go-schemaless:基于Uber的Schemaless的开源分片数据库框架

    这是MIT许可的Uber Schemaless(不变的BigTable样式分片MySQL / Postgres)的开源实现。 将其视为您自己的分片数据存储API和基础结构的潜在构建块。 github问题列表描述了有意保留的未实现的内容,以及该实现与Uber...

    schemaless-graphql-neo4j:将无类型和动态GraphQL查询转换为Cypher

    schemaless-graphql-neo4j 将无类型的动态GraphQL查询转换为Cypher。 签出,以更好地查看您可以编写的查询。入门$ npm install schemaless-graphql-neo4j :warning: 图书馆尚未发布操场您可以开始使用开发人员游乐场...

    藏经阁-Uber,SRE,缓存以及微服务.pdf

    * 存储:Schemaless(基于 MySQL 的自研存储系统)、MySQL、Cassandra、Redis/Twemproxy * 数据:Hadoop、Spark、Hive、Presto、Vertica * 队列:Cherami、Kafka * 搜索和日志:Elasticsearch、Logstash、Kibana * ...

    Quickwit 云存储上最快的搜索引擎

    灵活的索引选项:支持无模式(Schemaless)和严格模式(Strict Schema)索引,以适应不同的数据结构需求。 云存储上的亚秒级搜索:能够在Amazon S3、Azure Blob Storage、Google Cloud Storage等云存储服务上实现...

    Pentaho Analytics for MongoDB Cookbook(PACKT,2015)

    MongoDB is an open source, schemaless NoSQL database system. Pentaho as a famous open source Analysis tool provides high performance, high availability, and easy scalability for large sets of data. ...

    influxdb2-2.2.0.x86_64; influxdb2-client-2.3.0-linux-amd64.tar

    InfluxDB服务端和客户端最新下载,主要是centos系统环境; 官网下载不太方便,下载下来后方便大家使用 ---- InfluxDB是一个由...schemaless(无结构),可以是任意数量的列 Scalable可拓展 一系列函数,方便统计

    CnosDB 是一个具有高性能、高压缩比和高可用性的开源分布式时间序列数据库

    支持 schemaless ("无模式")的写入方式,支持历史数据补录(含乱序写入)。云原生: CnosDB 有原生的分布式设计、数据分片和分区、存算分离、Quorum 机制、Kubernetes 部署和完整的可观测性,具有最终一致性,能够...

    Python库 | fastavro-0.20.0-cp36-cp36m-manylinux1_x86_64.whl

    例如,你可以使用`fastavro.parse_schema()`来解析Avro模式,或者使用`fastavro.schemaless_reader()`和`fastavro.schemaless_writer()`处理未附带schema的数据。 总结起来,fastavro库是Python处理Avro数据的优秀...

    solr-7.6.0.zip

    5. **Schemaless模式**:为了简化设置过程,Solr 7.6.0引入了Schemaless模式,允许用户在不定义严格Schema的情况下快速启动和测试搜索服务。系统会自动推断字段类型,但生产环境中仍推荐使用预定义的Schema以确保...

    藏经阁-Uber SRE以及Cache服务在微服务环境下的演进.pdf

    Uber 的技术栈包括 Storage(Schemaless、MySQL、Cassandra、Redis/Twemproxy)、Data(Hadoop、Spark、Hive、Presto、Vertica)、Queue(Cherami、Kafka)、Search & Logging(Elasticsearch、Logstash、Kibana)等...

    influxdb-1.7.1_windows_amd64.zip

    同时,它有以下几大特点: schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似...

    InfluxDB-1.2.4 Windows x64

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    InfluxDB-1.1.0 Windows x64

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    influxdb-1.5.2_windows_amd64.zip

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    influxdb-1.2.4_windows64位

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    InfluxDB-1.0.2 Windows x64

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    influxdb-1.5.2_windows_amd64

    schemaless(无结构),可以是任意数量的列; min, max, sum, count, mean, median 一系列函数,方便统计; Native HTTP API, 内置http支持,使用http读写; Powerful Query Language 类似sql; Built-in Explorer ...

    基于云技术的智慧城市中大规模房屋数据处理.pdf

    HBase以其无模式(schemaless)的设计,能够灵活应对异构数据的存储,其分布式特性使得处理大量日志数据成为可能。Hadoop与MapReduce技术被用于进一步处理这些大数据集,通过分布式计算集群,实现对巨量日志数据的...

Global site tag (gtag.js) - Google Analytics