ANKUSH THAVALI
- 11 Jun, 2019
- 0 Comments
- 4 Mins Read
Hive partition with example – Interview Question

Hive Partitions
Create Hive Partition Table
CREATE TABLE zipcodes( RecordNumber int, Country string, City string, Zipcode int) PARTITIONED BY(state string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Create non-partition table
CREATE TABLE zipcodes_temp( RecordNumber int, Country string, City string, Zipcode int, state string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Load data to non-partition table
LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes_temp;
Set the Hive property
SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict;
Load data from non-partition to partition table
INSERT INTO zipcodes SELECT * FROM zipcodes_temp;
Show All Partitions on Hive Table
SHOW PARTITIONS zipcodes; +------------+ | partition | +------------+ | state=AL | | state=AZ | | state=FL | | state=NC | | state=PR | | state=TX | +------------+ 6 rows selected (0.074 seconds)
Add New Partition to the Hive Table
jdbc:hive2://127.0.0.1:10000>ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION '/user/data/zipcodes_ca';
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes; +------------+ | partition | +------------+ | state=AL | | state=AZ | | state=CA | | state=FL | | state=NC | | state=PR | | state=TX | +------------+ 7 rows selected (0.081 seconds)
Rename or Update Hive Partition
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');
Manually Renaming Partitions on HDFS
hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL
When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE to update the Hive Metastore. Not doing so will result in inconsistent results.SELECT doesn’t show the renamed partition SHOW PARTITIONS still shows the older partition
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes; +------------+ | partition | +------------+ | state=AZ | | state=CA | | state=FL | | state=NC | | state=NY | | state=PR | | state=TX | +------------+
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
Drop Hive Partition
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');
Manually Dropping Partitions on HDFS
user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL Deleted /user/hive/warehouse/zipcodes/state=AL SHOW PARTITIONS still shows the deleted partitions. jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes; +-----------------------------------+ | partition | +-----------------------------------+ | state=AL | | state=AZ | | state=FL | | state=NC | | state=PR | | state=TX | +-----------------------------------+ 6 rows selected (0.096 seconds) In order to fix this, you need to run MSCK REPAIR TABLE as shown below. jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes; jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes; +-----------------------------------+ | partition | +-----------------------------------+ | state=AZ | | state=FL | | state=NC | | state=PR | | state=TX | +-----------------------------------+ 5 rows selected (0.096 seconds)
Filter Partitions
SHOW PARTITIONS zipcodes PARTITION(state='NC'); +------------+ | partition | +------------+ | state=NC | +------------+ 1 row selected (0.182 seconds)
Specific Partition Location on HDFS
DESCRIBE FORMATTED zipcodes PARTITION(state='PR'); SHOW TABLE EXTENDED LIKE zipcodes PARTITION(state='PR');
Static Partition in HIve
Features of Static Partitioning
- Partitions are manually added so it is also known as manual partition
- Data Loading in static partitioning is faster as compare to dynamic partitioning so static partitioning is preferred when we have massive files to load.
- In static partitioning individual files are loaded as per the partition we want to set.
- where clause is used in order to use limit in static partition
- Altering the partition in the static partition is allowed whereas dynamic partition doesn’t support Alter statement.
Let's create a table 'student' with the following attributes (student_name, father_name, and percentage ) and partition it using 'section' in our default database.
CREATE TABLE student(student_name STRING ,father_name STRING ,percentage FLOAT) partitioned by (section STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Create seperate file for section A, section B,section C and section D.
Loading data from student_A partitioned with section 'A'
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_A' INTO TABLE student partition(section = "A");
Loading data from student_B partitioned with section 'B'
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_B' INTO TABLE student partition(section = 'B');
Loading data from student_C partitioned with section 'C'
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_C' INTO TABLE student partition(section = 'C');
Loading data from student_D partitioned with section 'D'
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_D' INTO TABLE student partition(section = 'D');
Now go to your HDFS(/user/hive/warehouse/) and check the student table to see how the partitions are made.
Partitioning an external table
CREATE EXTERNAL TABLE customer_external(id STRING, name STRING, gender STRING, state STRING) PARTITIONED BY (country STRING);
Now a partition can be added to the EXTERNAL table, using the ALTER TABLE ADD PARTITION command:
ALTER TABLE customer_external ADD PARTITION(country='UK') LOCATION '/user/hive/warehouse/customer/country=UK'