Customise Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorised as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyse the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customised advertisements based on the pages you visited previously and to analyse the effectiveness of the ad campaigns.

No cookies to display.

Hive partition with example – Interview Question

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • User AvatarKiran Dalvi
  • 11 Jun, 2019
  • 0 Comments
  • 4 Mins Read

Hive partition with example – Interview Question

Hive Partitions

Create Hive Partition Table

1
2
3
4
5
6
7
8
CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Create non-partition table

1
2
3
4
5
6
7
8
CREATE TABLE zipcodes_temp(
RecordNumber int,
Country string,
City string,
Zipcode int,
state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Load data to non-partition table

1
LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes_temp;

Set the Hive property

1
2
3
SET hive.exec.dynamic.partition=true;
 
SET hive.exec.dynamic.partition.mode=nonstrict;

Load data from non-partition to partition table

1
INSERT INTO zipcodes SELECT * FROM zipcodes_temp;

Show All Partitions on Hive Table

1
2
3
4
5
6
7
8
9
10
11
12
13
SHOW PARTITIONS zipcodes;
 
+------------+
| partition |
+------------+
| state=AL |
| state=AZ |
| state=FL |
| state=NC |
| state=PR |
| state=TX |
+------------+
6 rows selected (0.074 seconds)

Add New Partition to the Hive Table

1
jdbc:hive2://127.0.0.1:10000>ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION '/user/data/zipcodes_ca';
1
2
3
4
5
6
7
8
9
10
11
12
13
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AL   |
| state=AZ   |
| state=CA   |
| state=FL   |
| state=NC   |
| state=PR   |
| state=TX   |
+------------+
7 rows selected (0.081 seconds)

Rename or Update Hive Partition

1
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');

Manually Renaming Partitions on HDFS

1
hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL

When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE to update the Hive Metastore. Not doing so will result in inconsistent results. SELECT doesn’t show the renamed partition SHOW PARTITIONS still shows the older partition

1
2
3
4
5
6
7
8
9
10
11
12
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AZ   |
| state=CA   |
| state=FL   |
| state=NC   |
| state=NY   |
| state=PR   |
| state=TX   |
+------------+
1
jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;

Drop Hive Partition

1
jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');

Manually Dropping Partitions on HDFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL
 
 
SHOW PARTITIONS still shows the deleted partitions.
 
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AL                          |
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
6 rows selected (0.096 seconds)
 
In order to fix this, you need to run MSCK REPAIR TABLE as shown below.
 
 
jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
5 rows selected (0.096 seconds)

Filter Partitions

1
2
3
4
5
6
7
SHOW PARTITIONS zipcodes PARTITION(state='NC');
+------------+
| partition  |
+------------+
| state=NC   |
+------------+
1 row selected (0.182 seconds)

Specific Partition Location on HDFS

1
2
DESCRIBE FORMATTED zipcodes PARTITION(state='PR');
SHOW TABLE EXTENDED LIKE zipcodes PARTITION(state='PR');

Static Partition in HIve

Features of Static Partitioning

  • Partitions are manually added so it is also known as manual partition
  • Data Loading in static partitioning is faster as compare to dynamic partitioning so static partitioning is preferred when we have massive files to load.
  • In static partitioning individual files are loaded as per the partition we want to set.
  • where clause is used in order to use limit in static partition
  • Altering the partition in the static partition is allowed whereas dynamic partition doesn’t support Alter statement.

Let's create a table 'student' with the following attributes (student_name, father_name, and percentage ) and partition it using 'section' in our default database.

1
2
3
4
CREATE TABLE student(student_name STRING ,father_name  STRING ,percentage FLOAT)
partitioned by (section STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Create seperate file for section A, section B,section C and section D.

Loading data from student_A partitioned with section 'A'

1
2
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_A' INTO TABLE student
partition(section = "A");

Loading data from student_B partitioned with section 'B'

1
2
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_B' INTO TABLE student
partition(section = 'B');

Loading data from student_C partitioned with section 'C'

1
2
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_C' INTO TABLE student
partition(section = 'C');

Loading data from student_D partitioned with section 'D' ​

1
LOAD DATA LOCAL INPATH '/home/hdoop/Documents/student_D' INTO TABLE student partition(section = 'D');

Now go to your HDFS(/user/hive/warehouse/) and check the student table to see how the partitions are made.

Partitioning an external table

1
CREATE EXTERNAL TABLE customer_external(id STRING, name STRING, gender STRING, state STRING) PARTITIONED BY (country STRING);

Now a partition can be added to the EXTERNAL table, using the ALTER TABLE ADD PARTITION command:

1
ALTER TABLE customer_external ADD PARTITION(country='UK') LOCATION '/user/hive/warehouse/customer/country=UK'