Partitioning in Apache Hive

Reading Time: 2 minutes

Partitions

Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.thats where Partitioning comes into play a partition is nothing but a directory which contains the chunk of data when we do partitioning, we create a partition for each unique value of the column

lets run a simple example to see what it is

syntax to create partition table is

create table tablename(colname type) partitioned by(colname type);

if hive.exec.dynamic.partition.mode is set to strict, then you need to do at least one static partition. In non-strict mode, all partitions are allowed to be dynamic

yyy

partition

here we create a table named emp info with two fields name and addresss we partitioned the table by column ID of type int and then we insert the value in this table

it’s important to consider the cardinality of the column that will be partitioned on. Selecting a column with high cardinality will result in fragmentation of data Do not over-partition the data. With too many small partitions, the task of recursively scanning the directories becomes more expensive than a full table scan of the table.

syntax for inserting values is

insert into partition values();

first we insert record with id=1 now insert another record with id=2

par

now got to /user/hive

/warehouse/default/empinfo directory in your hdfs

as we can see there are two partitions one with name id=1 and other one as id =2 now when a select query is fired with where clause it will not scan the full table it will only scan the required partition

if you will tried it with a non partition table with large dataset it will take more time in comparison because it will have to go through entire table scan

i hope this blog will be helpful happy coding

Partitions

Recommend

首次用间接方法找到了银河系外的一颗小质量黑洞

Make your Oculus Quest 2 fit — and sound— better with these discounted mods

使用MRS CDL实现实时数据同步的极致性能

OPPO Reno 7 Pro：人像视频手机的出类拔萃之辈

高效采集数据，帮助应用业务增长

何为阴阳化负面情绪为动力

又有新玩法！荣耀60系列或将隔空手势功能融入Vlog当中

彻底搞懂访问者模式的静态、动态和伪动态分派

Avro Communication over TCP Sockets

好看的电子书还能赚钱月入1800元超简单人人皆可【附软件】

About Joyk