Blog

Hive Partitions

Create Hive Partition Table

CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Create non-partition table

CREATE TABLE zipcodes_temp(
RecordNumber int,
Country string,
City string,
Zipcode int,
state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Load data to non-partition table

 LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes_temp; 

Set the Hive property

SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict; 

Load data from non-partition to partition table

INSERT INTO zipcodes SELECT * FROM zipcodes_temp;

Show All Partitions on Hive Table

SHOW PARTITIONS zipcodes;

+------------+
| partition |
+------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+------------+
6 rows selected (0.074 seconds)

Add New Partition to the Hive Table

jdbc:hive2://127.0.0.1:10000>ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION '/user/data/zipcodes_ca';
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AL   |
| state=AZ   |
| state=CA   |
| state=FL   |
| state=NC   |
| state=PR   |
| state=TX   |
+------------+
7 rows selected (0.081 seconds)

Rename or Update Hive Partition

jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');

Manually Renaming Partitions on HDFS

hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL

When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE to update the Hive Metastore. Not doing so will result in inconsistent results. SELECT doesn’t show the renamed partition SHOW PARTITIONS still shows the older partition

jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AZ   |
| state=CA   |
| state=FL   |
| state=NC   |
| state=NY   |
| state=PR   |
| state=TX   |
+------------+
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;

Drop Hive Partition

jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');

Manually Dropping Partitions on HDFS

user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL


SHOW PARTITIONS still shows the deleted partitions.

jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AL                          |
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
6 rows selected (0.096 seconds)

In order to fix this, you need to run MSCK REPAIR TABLE as shown below.


jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
5 rows selected (0.096 seconds)

Filter Partitions

SHOW PARTITIONS zipcodes PARTITION(state='NC');
+------------+
| partition  |
+------------+
| state=NC   |
+------------+
1 row selected (0.182 seconds)

Specific Partition Location on HDFS

DESCRIBE FORMATTED zipcodes PARTITION(state='PR');
SHOW TABLE EXTENDED LIKE zipcodes PARTITION(state='PR');

Static Partition in HIve

Features of Static Partitioning

  • Partitions are manually added so it is also known as manual partition
  • Data Loading in static partitioning is faster as compare to dynamic partitioning so static partitioning is preferred when we have massive files to load.
  • In static partitioning individual files are loaded as per the partition we want to set.
  • where clause is used in order to use limit in static partition
  • Altering the partition in the static partition is allowed whereas dynamic partition doesn’t support Alter statement.

Let's create a table 'student' with the following attributes (student_name, father_name, and percentage ) and partition it using 'section' in our default database.

CREATE TABLE student(student_name STRING ,father_name  STRING ,percentage FLOAT)
partitioned by (section STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Create seperate file for section A, section B,section C and section D.

Loading data from student_A partitioned with section 'A'

LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_A' INTO TABLE student
partition(section = "A");

Loading data from student_B partitioned with section 'B'

LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_B' INTO TABLE student
partition(section = 'B');

Loading data from student_C partitioned with section 'C'

LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_C' INTO TABLE student
partition(section = 'C');

Loading data from student_D partitioned with section 'D' ​

 LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_D' INTO TABLE student partition(section = 'D'); 

Now go to your HDFS(/user/hive/warehouse/) and check the student table to see how the partitions are made.

Partitioning an external table

CREATE EXTERNAL TABLE customer_external(id STRING, name STRING, gender STRING, state STRING) PARTITIONED BY (country STRING);

Now a partition can be added to the EXTERNAL table, using the ALTER TABLE ADD PARTITION command:

ALTER TABLE customer_external ADD PARTITION(country='UK') LOCATION '/user/hive/warehouse/customer/country=UK'

Follow me

Contact us for Training/ Job Support

Caution: Your use of any information or materials on this website is entirely at your own risk. It is provided for educational purposes only. It has been tested internally, however, we do not guarantee that it will work for you. Ensure that you run it in your test environment before using.