icon Register for Oracle RAC DBA Live Session on 12 May at 8.00 PM IST ENROLL NOW

Data Analyst Interview Question

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
interview questions and answers
  • 15 Apr, 2026
  • 0 Comments
  • 8 Mins Read

Data Analyst Interview Question

Interview Questions and Answers

1. What does a data analyst do?

A data analyst collects and studies data to find useful patterns and insights that help businesses make better decisions. They use tools like SQL, Excel, and dashboards to present findings clearly.

2. Why is EDA important?

Exploratory Data Analysis helps you understand the dataset before making decisions. It allows you to detect errors, understand trends, and ensure your analysis is based on reliable data.

3. What is data wrangling?

It is the process of transforming raw and unorganized data into a structured format so it can be easily analyzed.

4. Steps in an analytics project

  • Understand the business problem
  • Gather relevant data
  • Clean and explore the data
  • Analyze and visualize
  • Share insights and recommendations

5. Common challenges in data analysis

  • Poor quality data
  • Confusing requirements
  • Incorrect joins
  • Data access limitations
  • Changing business needs

6. WHERE vs HAVING

WHERE is used to filter records before grouping, while HAVING is applied after grouping to filter aggregated results.

7. INNER JOIN vs LEFT JOIN

INNER JOIN returns only matching records from both tables. LEFT JOIN returns all records from the left table and matching ones from the right.

8. Can alias be used in WHERE?

No, because WHERE runs before SELECT, so the alias is not yet created.

9. UNION vs INTERSECT vs EXCEPT

  • UNION combines datasets
  • INTERSECT returns common data
  • EXCEPT shows differences

10. Correlated vs Non-correlated subquery

A non-correlated query runs independently, while a correlated query depends on the outer query and runs repeatedly.

11. ROW_NUMBER vs RANK vs DENSE_RANK

ROW_NUMBER gives unique numbers, RANK skips numbers when tied, and DENSE_RANK does not skip numbers.

12. What is CTE?

A CTE is a temporary result set used to simplify complex SQL queries and improve readability.

13. Window function

It performs calculations across rows without grouping them, like ranking or running totals.

14. Stored procedure

A stored procedure is a pre-written SQL program stored in the database that can be executed multiple times.

15. Nth highest value

You can use ranking functions like DENSE_RANK to identify the nth highest value.

16. Normal distribution

It is a symmetric distribution where most values are centered around the mean.

17. Correlation vs causation

Correlation shows a relationship, but causation proves that one variable directly affects another.

18. Type I vs Type II error

Type I error means detecting something that is not true. Type II error means missing something that actually exists.

19. Variance vs Covariance

Variance measures spread of a single variable, while covariance measures how two variables change together.

20. Overfitting vs Underfitting

Overfitting happens when the model memorizes data, while underfitting happens when it fails to capture patterns.

21 . What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?

INNER JOIN returns matching records from both tables. LEFT JOIN returns all records from the left table and matching rows from the right table, while RIGHT JOIN returns all rows from the right table and matching rows from the left table.

22.How do you find duplicate records in a table?

We use GROUP BY with HAVING COUNT(*) > 1 to identify duplicate values.
Example:

SELECT name, COUNT(*)
FROM employee
GROUP BY name
HAVING COUNT(*) > 1;

23. Write a SQL query to find the second highest salary in a table.

SELECT MAX(salary)
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee);

24 . Explain the difference between WHERE and HAVING.

WHERE filters rows before grouping is done, while HAVING filters grouped data after using GROUP BY.

25 . What are primary keys and foreign keys?

A Primary Key uniquely identifies each record in a table and cannot contain NULL values. A Foreign Key creates a relationship between two tables by referring to the primary key of another table.

26 . What are Pivot Tables, and how do you use them?

Pivot Tables are used to summarize and analyze large amounts of data quickly. We create them by selecting data and going to Insert → Pivot Table to arrange fields into rows, columns, and values.

27 . Explain VLOOKUP and INDEX-MATCH.

VLOOKUP searches for a value in the first column and returns matching data from another column. INDEX-MATCH is a more flexible lookup method that works faster and can search in any direction.

28 . How do you handle missing data in Excel?

Missing data can be handled by removing blank rows, replacing blanks with default values, or using functions like IFERROR and filters to identify missing entries.

29 . What are Conditional Formatting and its use cases?

Conditional Formatting automatically changes cell formatting based on conditions. It is used to highlight duplicates, low/high values, deadlines, or trends using colors and icons.

30 . How do you create charts for data visualization in Excel?

Select the data range, go to Insert → Charts, and choose a chart type like bar, line, or pie chart. Charts help present data visually for better understanding and analysis.

31 . What is the difference between mean, median, and mode?

Mean is the average of all values, median is the middle value in sorted data, and mode is the value that appears most frequently in a dataset.

32 . Explain standard deviation and variance.

Variance measures how far data values are spread from the mean, while standard deviation is the square root of variance and shows data spread in an easier-to-understand form.

33 . What is correlation vs causation?

Correlation means two variables are related or move together, while causation means one variable directly causes changes in another variable.

34 . What is a p-value in hypothesis testing?

A p-value shows the probability that the result happened by chance. A smaller p-value (usually less than 0.05) indicates stronger evidence against the null hypothesis.

35 . What is the difference between population and sample data?

Population includes all data or individuals being studied, while a sample is a smaller subset taken from the population for analysis.

36 . What are the most common tools for data visualization?

Common data visualization tools include Microsoft Excel, Power BI, Tableau, Google Data Studio, and Python libraries like Matplotlib and Seaborn. These tools help present data in charts, graphs, and dashboards.

37 . Explain the difference between Tableau and Power BI.

Tableau is known for advanced visualization and better handling of large datasets, while Power BI is more user-friendly and integrates well with Microsoft products like Excel and Azure.

38 . What is a dashboard, and what makes it effective?

A dashboard is a visual report that displays important data and metrics in one place. An effective dashboard is simple, interactive, and shows clear insights for decision-making.

39 . How do you choose the right chart type for data?

Choose charts based on the data type and purpose. Bar charts compare values, line charts show trends over time, and pie charts display percentage distribution.

40 . What are KPIs (Key Performance Indicators)?

KPIs are measurable values used to track business performance and goals. Examples include sales growth, customer retention, and profit percentage.

41 . What is the difference between raw data and processed data?

Raw data is unorganized and unprocessed information collected from sources. Processed data is cleaned, organized, and ready for analysis.

42 . Name a few common tools used by data analysts.

Common tools include Microsoft Excel, SQL, Power BI, Tableau, Python, and Google Sheets. These tools help in data analysis and visualization.

43 . What is the difference between Excel and SQL?

Excel is mainly used for spreadsheets, calculations, and small-scale analysis, while SQL is used to manage and query large amounts of data stored in databases.

44. What is a database?

A database is an organized collection of data stored electronically. It helps users store, manage, and retrieve information efficiently.

45 . What is a table in a database?

A table is a structure in a database that stores data in rows and columns. Each table contains related information about a specific topic.

46 . What is a primary key?

A primary key is a column that uniquely identifies each record in a table. It cannot contain duplicate or NULL values.

47 . What is the difference between rows and columns in a table?

Rows represent individual records, while columns represent specific attributes or fields of the data.

48 . What is a chart, and why do we use it?

A chart is a graphical representation of data. It helps users understand trends, comparisons, and patterns easily.

49 . What is the difference between bar chart and pie chart?

A bar chart compares values across categories, while a pie chart shows percentage distribution of a whole.

50 . What is Microsoft Excel mainly used for?

Microsoft Excel is mainly used for data entry, calculations, reporting, analysis, and creating charts or dashboards.

51 . What is data cleaning?

Data cleaning is the process of fixing errors, removing duplicates, and handling missing values to improve data quality.

52 . Why is data important for businesses?

Data helps businesses make informed decisions, improve performance, understand customers, and increase profits.

53 . What is the difference between qualitative and quantitative data?

Qualitative data describes qualities or categories, while quantitative data represents numerical values and measurements.

54 . Give an example of structured data.

Employee records stored in a database table with columns like ID, Name, and Salary are examples of structured data.

55 . Give an example of unstructured data.

Emails, videos, images, and social media posts are examples of unstructured data because they do not follow a fixed format.

56 . What is the difference between average (mean) and median?

Mean is the sum of values divided by total count, while median is the middle value in sorted data.

57 . What does KPI stand for?

KPI stands for Key Performance Indicator. It is used to measure business performance and success against goals.

Looking to start your career in data analytics? 
Join the best google data analytics institute and learn everything from basics to advanced concepts with real-time projects.

Get hands-on training, expert mentorship, and placement support to kickstart your career in data analytics.

Enroll now and take the first step toward becoming a Data Analyst!

lets talk - learnomate helpdesk

Book a Free Demo