1

你还不会ES的CUD吗?

 3 years ago
source link: http://www.cnblogs.com/liusy01/p/13721500.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

近端时间在搬砖过程中对es进行了操作,但是对es查询文档不熟悉,所以这两周都在研究es,简略看了《Elasticsearch权威指南》,摸摸鱼又是一天。

es是一款基于Lucene的实时分布式搜索和分析引擎,今天咱不聊其应用场景,聊一下es索引增删改。

环境:Centos 7,Elasticsearch6.8.3,jdk8

(最新的es是7版本,7版本需要jdk11以上,所以装了es6.8.3版本。)

下面都将以student索引为例

一、创建索引

PUT   http://192.168.197.100:9200/student
{
    "mapping":{
      "_doc":{ //“_doc”是类型type,es6中一个索引下只有一个type,不能有其它type
        "properties":{
          "id": {
              "type": "keyword"
          },
          "name":{
            "type":"text",
            "index":"analyzed",
            "analyzer":"standard"
          },
          "age":{
            "type":"integer",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above":256
              }
            }
          },
          "birthday":{
            "type":"date"
          },
          "gender":{
            "type":"keyword"
          },
          "grade":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          },
          "class":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          }
        }
      }
    },
    "settings":{
      //主分片数量
      "number_of_shards" : 1, 
      //分片副本数量
      "number_of_replicas" : 1
    }
}

type属性是text和keyword的区别:

(1)text在查询的时候会被分词,用于搜索

(2)keyword在查询的时候不会被分词,用于聚合

index属性是表示字符串以何种方式被索引,有三种值

(1)analyzed:字段可以被模糊匹配,类似于sql中的like

(2)not_analyzed:字段只能精确匹配,类似于sql中的“=”

(3)no:字段不提供搜索

analyzer属性是设置分词器,中文的话一般是ik分词器,也可以自定义分词器。

number_of_shards属性是主分片数量,默认是5,创建之后不能修改

number_of_replicas属性时分片副本数量,默认是1,可以修改

创建成功之后会返回如下json字符串

{    "acknowledged": true,    "shards_acknowledged": true,    "index": "student"}

创建之后如何查看索引的详细信息呢?

GET http://192.168.197.100:9200/student/_mapping

es6版本,索引之下只能有一个类型,例如上文中的“_doc”。

es跟关系型数据库比较:

2meQ3uv.jpg!mobile

二、修改索引

//修改分片副本数量为2
PUT http://192.168.197.100:9200/student/_settings
{
  "number_of_replicas":2
}

三、删除索引

//删除单个索引 
DELETE http://192.168.197.100:9200/student

//删除所有索引
DELETE  http://192.168.197.100:9200/_all

四、默认分词器standard和ik分词器比较

es默认的分词器是standard,它对英文的分词是以空格分割的,中文则是将一个词分成一个一个的文字,所以其不适合作为中文分词器。

例如:standard对英文的分词

//此api是查看文本分词情况的 
POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"standard"
}

结果如下:

{
    "tokens": [
        {
            "token": "the",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "people's",
            "start_offset": 4,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "of",
            "start_offset": 22,
            "end_offset": 24,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

对中文的分词:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"standard"
}

结果如下:

{
    "tokens": [
        {
            "token": "中",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "华",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "人",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "民",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "共",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "和",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "万",
            "start_offset": 7,
            "end_offset": 8,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        },
        {
            "token": "岁",
            "start_offset": 8,
            "end_offset": 9,
            "type": "<IDEOGRAPHIC>",
            "position": 8
        }
    ]
}

ik分词器是支持对中文进行词语分割的,其有两个分词器,分别是ik_smart和ik_max_word。

(1)ik_smart:对中文进行最大粒度的划分,简略划分

例如:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"ik_smart"
}

结果如下:

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "万岁",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

(2)ik_max_word:对中文进行最小粒度的划分,将文本划分尽量多的词语

例如:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"ik_max_word"
}

结果如下:

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "万岁",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "万",
            "start_offset": 7,
            "end_offset": 8,
            "type": "TYPE_CNUM",
            "position": 10
        },
        {
            "token": "岁",
            "start_offset": 8,
            "end_offset": 9,
            "type": "COUNT",
            "position": 11
        }
    ]
}

ik分词器对英文的分词:

POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"ik_smart"
}

结果如下:会将不重要的词去掉,但standard分词器会保留(英语水平已经退化到a an the都不知道是属于什么类型的词了,身为中国人,这个不能骄傲)

{
    "tokens": [
        {
            "token": "people",
            "start_offset": 4,
            "end_offset": 10,
            "type": "ENGLISH",
            "position": 0
        },
        {
            "token": "s",
            "start_offset": 11,
            "end_offset": 12,
            "type": "ENGLISH",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "ENGLISH",
            "position": 2
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "ENGLISH",
            "position": 3
        }
    ]
}

五、添加文档

可以任意添加字段

//1是“_id”的值,唯一的,也可以随机生成
POST http://192.168.197.100:9200/student/_doc/1
{
  "id":1,
  "name":"tom",
  "age":20,
  "gender":"male",
  "grade":"7",
  "class":"1"
}

六、更新文档

POST http://192.168.197.100:9200/student/_doc/1/_update
{
  "doc":{
    "name":"jack"
  }
}

七、删除文档

//1是“_id”的值 
DELETE http://192.168.197.100:9200/student/_doc/1

上述就是简略的对es进行索引创建,修改,删除,文档添加,删除,修改等操作,为避免篇幅太长,文档查询操作将在下篇进行更新。

e67ZJvy.jpg!mobile


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK