Data Analyst Interview Questions
Data Analyst Interview Questions
EXCEL:
Q1. Differentiate between Power BI and Excel.
Answer:
-
Tabular Reports: Excel is better; Power BI struggles with purely tabular layouts.
-
Duplicate Tables: Excel supports; Power BI doesn’t.
-
Analytics: Excel supports advanced analytics (pivot tables, formulas), while Power BI focuses on simple analytics & KPIs.
-
Interactivity: Power BI allows cross-filtering, Excel does not.
-
Use Cases:
-
Excel – detailed calculations, financial models.
-
Power BI – dashboards, KPIs, interactive analytics.
-
Q2. Tell us about the Excel add-ins for Power BI.
Answer:
The Excel BI Toolkit includes:
-
Power Query – For data connection & transformation.
-
Power Pivot – For data modeling & calculations.
-
Power View – For interactive visualization.
-
Power Map – For 3D geographic analysis.
Q3. Why INDEX + MATCH is Better Than VLOOKUP
While VLOOKUP is many users’ first tool for looking up data, INDEX + MATCH is widely considered a superior, more flexible, and more powerful combination. Here’s a detailed breakdown of why.
1. Works Left of the Lookup Column (The Biggest Advantage)
-
VLOOKUP Limitation: The
VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
function can only look for values in the left-most column of thetable_array
and return a value from a column to the right. If your data is structured with the identifier (e.g., Employee ID) not in the first column, VLOOKUP fails. -
INDEX + MATCH Solution:
MATCH()
finds the position (row number) of your lookup value anywhere in a column.INDEX()
then returns the value from that same row number in any column you specify, regardless of whether it’s to the left or right. This gives you complete freedom in structuring your worksheets.
2. Better Performance on Large Datasets
-
VLOOKUP’s Inefficiency: VLOOKUP examines the entire
table_array
you provide. On a 100,000-row dataset where yourtable_array
is columns A through Z, VLOOKUP is scanning all 100,000 rows and all 26 columns, even if you just want to return data from column C. This is computationally expensive. -
INDEX + MATCH Efficiency: You define two precise ranges:
-
MATCH()
only looks in the single, specific lookup column (e.g.,A:A
). -
INDEX()
only looks in the single, specific return column (e.g.,C:C
).
This drastically reduces the number of cells Excel needs to process, leading to faster calculation times, especially in large workbooks.
-
3. Inserting or Deleting Columns is Safer
-
VLOOKUP’s Fragility: VLOOKUP uses a static column index number (e.g.,
3
to return the 3rd column). If you insert or delete a column within yourtable_array
, this number can break, and you might start returning data from the wrong column without immediately realizing it. -
INDEX + MATCH’s Robustness: Since you directly reference the entire return column (e.g.,
C:C
), inserting or deleting other columns elsewhere on the sheet does not affect the formula. It will always pull from column C.
4. Exact Match by Default
-
While both can do exact match (
FALSE
or0
), VLOOKUP’s fourth argument is[range_lookup]
, which defaults toTRUE
(approximate match). This is a common source of errors for beginners who forget to set it toFALSE
. -
The
MATCH()
function’s third argument for exact match is0
, which has no default and must be explicitly stated, making the intent clearer.
Example Explained:
=INDEX(Salary, MATCH("John", Name, 0))
-
MATCH("John", Name, 0)
: This part is the “finder.”-
It searches for the exact text “John” (the
0
signifies exact match) within the range namedName
. -
Let’s say “John” is in the 5th row of the
Name
range. The MATCH function returns the number5
.
-
-
INDEX(Salary, 5)
: This part is the “retriever.”-
It goes to the range named
Salary
. -
It finds the value in the 5th row of that range and returns it.
-
-
The result is John’s salary. The
Name
andSalary
ranges can be anywhere on the sheet, in any order.
Q4. Create a Dynamic Dashboard
A dynamic dashboard in Excel is an interactive report that allows users to quickly visualize and analyze data by filtering and slicing it in real-time.
Core Components & Steps:
1. Foundation: Pivot Tables + Pivot Charts
-
Pivot Tables: Start by creating Pivot Tables from your raw data. They are the engine of your dashboard. They allow you to summarize millions of rows of data into meaningful categories (e.g., Sales by Region, Profit by Product Category).
-
Pivot Charts: Create charts (bar, line, pie, etc.) directly from your Pivot Tables. These visualizations become the core elements of your dashboard. Right-click a Pivot Table ->
PivotChart
to create one that is automatically linked.
2. Interactivity: Slicers & Timelines
-
Slicers: These are visual buttons that let you filter the data. Instead of digging into filter drop-downs, users can just click on “West” in a Region slicer to see data only for the West region.
-
To Add: Click inside your Pivot Table -> Go to the
PivotTable Analyze
tab -> ClickInsert Slicer
-> Choose the fields you want to filter by (e.g., Region, Product, Salesperson).
-
-
Timelines: A special type of slicer designed specifically for dates. It provides a graphical way to filter data by days, months, quarters, or years.
-
To Add:
PivotTable Analyze
tab ->Insert Timeline
-> Select your date column.
-
3. Linking Slicers Across Multiple Pivot Tables/Charts (The Key to Dynamism)
This is what makes a dashboard truly cohesive. One click filters every single chart on the page.
-
Right-click on the slicer.
-
Select Report Connections (or Slicer Settings).
-
A dialog box will appear showing all the Pivot Tables in your workbook.
-
Check the boxes for every Pivot Table that you want that slicer to control.
-
Now, when you click an item in the slicer, all the connected Pivot Tables and their corresponding charts will update simultaneously.
Best Practices:
-
Layout: Arrange your charts, tables, and slicers on a single, clean worksheet.
-
Design: Use a consistent color scheme and clear titles. The
Dashboard
tab should be for presentation only; keep raw data and Pivot Tables on other hidden sheets.
Q5. Power Query in Excel (The Game Changer)
Power Query (known as Get & Transform Data in Excel 2016+) is a powerful data connection and transformation tool. It’s an ETL tool built right into Excel.
ETL: Extract, Transform, Load
1. Extract
-
This is the “E” in ETL. Power Query can connect to a vast array of data sources far beyond what standard Excel can import. This includes:
-
Files: Excel workbooks, CSV/Text, XML, JSON, PDF (preview).
-
Databases: SQL Server, Access, Oracle, MySQL.
-
Cloud & Other Sources: Azure, SharePoint, OData feeds, web pages (scraping data from HTML tables), and many more.
-
2. Transform
-
This is the “T” in ETL and Power Query’s core strength. This is where you clean, reshape, and prepare your data before it ever touches your worksheet. The original data source remains untouched. Common transformations include:
-
Cleaning: Removing duplicates, trimming spaces, changing data types (text to numbers/dates), replacing values.
-
Shaping: Filtering rows, removing unnecessary columns, splitting columns (e.g., “John Smith” into “First Name” and “Last Name”), merging columns.
-
Restructuring: Grouping data, pivoting (unpivoting is a superpower – turning cross-tab data into a clean table), adding custom columns with formulas (using the M language).
-
Combining: Appending multiple tables (stacking them on top of each other) and merging tables (like a VLOOKUP but between entire queries, much more powerful and robust).
-
3. Load
-
This is the “L” in ETL. Once you are satisfied with the transformed data, you load it into Excel. You have two main options:
-
Load to Worksheet: Creates a static table on a worksheet.
-
Load to Data Model: Loads the data into the in-memory Power Pivot engine, which is essential for handling very large datasets and creating complex data models with relationships between multiple tables.
-
Why It’s Revolutionary:
-
Reproducibility & Automation: The entire ETL process you create is saved as a “query.” When your source data is refreshed (e.g., you get a new CSV file every day), you simply right-click the query result in Excel and hit Refresh. Power Query will automatically run all the cleaning steps again on the new data, outputting a perfectly formatted and cleaned table in seconds. This eliminates hours of manual, error-prone work.
-
Data Integrity: Your raw data is never altered. All changes are stored as steps in the query.
Q6. Define Data Validation in Excel
Data Validation is a powerful Excel feature that allows you to control exactly what type of data can be entered into a cell or a range of cells. Its primary purpose is to ensure data integrity and accuracy by preventing users from entering invalid data.
Key Details:
-
How it Works: You set specific rules (criteria) for a cell. If a user tries to enter data that breaks these rules, Excel can reject the entry or show a warning message.
-
Common Validation Criteria:
-
Whole Number / Decimal: Restrict entry to numbers only (e.g., ages must be between 18 and 65).
-
List: Creates a dropdown list from a predefined set of options (e.g., Department: Sales, Marketing, Engineering).
-
Date / Time: Restrict entry to a valid date or time within a specific range.
-
Text Length: Limit the number of characters (e.g., a password must be at least 8 characters).
-
Custom: Use a formula to create complex rules (e.g., ensure a cell value is greater than the value in another cell).
-
-
Input and Error Messages: You can create a friendly “Input Message” (like a tooltip) that guides the user on what to enter. You can also customize the “Error Alert” that appears when invalid data is entered, making it more informative.
-
Location: Found on the
Data
tab, in theData Tools
group.
Why it’s important: It eliminates many common data entry errors, standardizes entries, and makes data easier to sort, filter, and analyze later.
Q7. How can you get the current date and time in Excel?
There are two primary methods to get the current date and time: using functions and using a keyboard shortcut.
Method 1: Using Functions (Volatile)
-
For Current Date: Use the
TODAY()
function.-
Formula:
=TODAY()
-
Returns the current date as per your system’s clock. It does not include time.
-
-
For Current Date AND Time: Use the
NOW()
function.-
Formula:
=NOW()
-
Returns the current date and time as per your system’s clock.
-
Important Note: Both TODAY()
and NOW()
are volatile functions. This means they recalculate every time the worksheet recalculates (e.g., when you open the file, edit a cell, or press F9). The value will not remain static unless you copy it and use Paste Special > Values.
Method 2: Using Keyboard Shortcuts (Static)
-
For Static Current Date: Press
Ctrl + ;
(semicolon). -
For Static Current Time: Press
Ctrl + Shift + :
(colon). -
For Static Current Date & Time: Press
Ctrl + ;
, then pressSpace
, then pressCtrl + Shift + :
.
Key Difference: The keyboard shortcuts enter the current date/time as a static value. It will not update automatically; it’s like typing the value in manually. This is useful for timestamps (e.g., recording when a specific entry was made).
Q8. How does the AND() function work in Excel?
The AND()
function is a logical function used to test multiple conditions at the same time. It returns either TRUE
or FALSE
.
-
Syntax:
=AND(logical1, [logical2], ...)
-
logical1
,logical2
, etc., are the conditions you want to test. You can test up to 255 conditions.
-
-
How it Works: The
AND
function returnsTRUE
only if ALL the specified conditions are true. If any single condition is false, the entire function returnsFALSE
.
Example:
You have a student’s score in cell A2. You want to check if they passed, which requires a score above 50 AND below 100.
=AND(A2>50, A2<100)
-
If A2 is
75
, the result isTRUE
(both conditions are met). -
If A2 is
120
, the result isFALSE
(the second condition fails). -
If A2 is
40
, the result isFALSE
(the first condition fails).
It is extremely powerful when nested inside other functions like IF
to create complex conditional formulas:
=IF(AND(A2>50, A2<100), "Pass", "Fail")
Q9. What is a Macro in Excel?
A Macro is a piece of programming code, written in VBA (Visual Basic for Applications), that automates repetitive tasks in Excel. It’s essentially a recorded or written sequence of commands and actions that you can play back whenever you need to perform that task.
Key Details:
-
Recording: The easiest way to create a macro is to use the Macro Recorder (
View
tab >Macros
>Record Macro
). You perform the steps once manually, and Excel translates your actions into VBA code. -
Editing: Recorded macros can be viewed and edited in the VBA Editor (opened by pressing
Alt + F11
). This allows for more complex and powerful automation. -
Execution: You can run a macro from the Macros dialog box (
View
>Macros
>View Macros
), by assigning it to a button, or a keyboard shortcut (likeCtrl + Shift + C
). -
Use Cases:
-
Formatting large reports consistently.
-
Combining data from multiple sheets or workbooks.
-
Automating complex calculations and data manipulation.
-
Creating custom functions and user forms.
-
Security Note: Macros can contain harmful code. Excel has security settings to disable them by default. Only enable macros from trusted sources.
Q10. What is VLOOKUP in Excel?
VLOOKUP (Vertical Lookup) is one of Excel’s most popular and powerful lookup and reference functions. It searches for a specific value in the first column of a table and returns a corresponding value from a different column in the same row.
Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
-
lookup_value
: The value you want to search for (e.g., an Employee ID). -
table_array
: The range of cells that contains the data (the table). The first column of this range must contain thelookup_value
. -
col_index_num
: The column number (within thetable_array
) from which to retrieve the return value. The first column is 1, the second is 2, etc. -
[range_lookup]
: Optional. UseFALSE
for an exact match. UseTRUE
(or omit) for an approximate match. You will almost always useFALSE
.
Example:
You have a product table where Column A has Product ID
and Column B has Price
.
You want to find the price for product ID “P123”.
=VLOOKUP("P123", A2:B100, 2, FALSE)
This formula looks for “P123” in the range A2:A100 and returns the value from the 2nd column (column B) of the same row.
Limitation: VLOOKUP can only look to the right. It cannot return a value from a column to the left of the lookup column. For more flexibility, use the newer XLOOKUP or INDEX/MATCH combination.
Q11. How can you make a dropdown list in MS Excel?
You create a dropdown list using the Data Validation feature.
Steps:
-
Prepare your list: Type the options for your dropdown in a single column or row on your worksheet (e.g., in cells
Z1:Z5
). Alternatively, you can type them directly in the Data Validation box. -
Select the cell: Click on the cell where you want the dropdown list to appear.
-
Open Data Validation: Go to the
Data
tab >Data Tools
group > Click Data Validation. -
Set the Criteria:
-
In the
Settings
tab, underAllow:
, select List. -
In the
Source:
box, either:-
Click the range selector icon and select the cells containing your list (e.g.,
$Z$1:$Z$5
). -
Type the list items directly, separated by commas (e.g.,
Yes, No, Maybe
).
-
-
-
(Optional) Configure the
Input Message
andError Alert
tabs to guide the user. -
Click OK.
The selected cell will now have a dropdown arrow. Clicking it will show the list of options you defined, and the user can only choose from those options.
Q12. How are duplicate entries found in Excel?
There are several methods to find and manage duplicates.
Method 1: Conditional Formatting (For Visual Identification)
This is the quickest way to highlight duplicates.
-
Select the range of cells you want to check.
-
Go to the
Home
tab >Styles
group > Conditional Formatting. -
Select
Highlight Cells Rules
> Duplicate Values. -
Choose a formatting style (e.g., light red fill) and click OK.
All duplicate values in the selected range will be instantly highlighted.
Method 2: The Remove Duplicates Tool (For Deletion)
This tool permanently deletes duplicate rows.
-
Select any cell within your data range.
-
Go to the
Data
tab >Data Tools
group > Remove Duplicates. -
A dialog box appears. If your data has headers, check the “My data has headers” box.
-
Select the columns you want to check for duplicates. If you select all columns, a duplicate is only found if every value in the row is identical to another row.
-
Click OK. Excel will remove the duplicate rows and show a message stating how many duplicates were removed and how many unique values remain.
Method 3: Using a Formula (For Flagging or Counting)
You can use a formula to identify if a value is a duplicate.
-
To Flag a Duplicate: The
COUNTIF
function is common for this.-
=COUNTIF($A$2:$A$10, A2)>1
-
This formula, entered in cell B2 and copied down, checks how many times the value in A2 appears in the range A2:A10. If the count is greater than 1, it returns
TRUE
(meaning it’s a duplicate).
-
-
To List Unique Values: In newer versions of Excel (Office 365), you can use the
UNIQUE()
function to extract a list of all distinct values from a range.-
=UNIQUE(A2:A100)
-
Q13. What is a waterfall chart and when do we use it?
A waterfall chart shows how an initial value is incrementally increased and/or decreased by a series of positive and negative contributions, resulting in a final value. Each “step” is a column that starts where the previous one ended, so you can see the cumulative effect of components.
When to use
Explaining bridges between two numbers: Revenue → Gross Profit → Operating Profit → Net Income.
Budget/forecast walks: Starting balance → adds → subtracts → ending balance.
Variance analysis: Plan vs. Actual with reasons (price, volume, mix).
How to build in Excel (Office 2016+)
Arrange data in two columns (Category, Amount) in logical order. Include starting and ending totals.
Insert → Charts → Waterfall.
Right-click the starting and ending columns → Set as Total (so they “sit on the baseline”).
Format data series for readability (data labels, colors for increases/decreases, sort order if needed).
Older Excel (pre-2016)
Use a stacked column workaround with helper columns: Base, Increase, Decrease, Totals, and set Base to be invisible.
Interview tip
Emphasize that waterfall charts are ideal for storytelling with numbers—stakeholders can instantly see what drove the change and by how much.
Q14. How can you highlight cells with negative values in Excel?
Fastest method (built-in rule)
Select the range.
Home → Conditional Formatting → Highlight Cells Rules → Less Than…
Enter 0, choose a format (e.g., Light Red Fill), OK.
Formula-based rule (to highlight entire rows)
Select all rows of interest, e.g., A2:F500.
Conditional Formatting → New Rule → Use a formula to determine which cells to format.
Formula (assuming the value to test is in column D):
=$D2<0
Set the format (fill/bold), OK.
Alternative (number formatting only)
Format Cells → Number → Custom → 0;[Red]-0;0
This shows negative numbers in red, but doesn’t highlight the cell—and won’t trigger filters.
Pitfalls
If cells contain text like “–” or “-5” stored as text, the rule Less Than 0 won’t catch them—convert to numbers first.
Q15. How can you clear all formatting without removing contents?
Clear formats
Select the range (or press Ctrl+A twice to select the entire sheet’s used range).
Home → Clear (eraser icon) → Clear Formats.
Other useful clears
Clear All: removes contents + formats + comments (be careful).
Clear Contents: keeps formats.
Paste Special → Values: replaces formulas with values but keeps formats (or combine with Clear Formats afterwards).
Shortcuts
Alt → H → E → F (Windows) opens Clear Formats via key tips.
Q16. What is a Pivot Table, and what are its sections?
A Pivot Table summarizes large datasets dynamically—drag fields to quickly aggregate and slice data without writing formulas.
Sections
Filters: Top-level slicer for the entire pivot (e.g., Year).
Columns: Fields that define columns (e.g., Quarter).
Rows: Fields that define rows (e.g., Product).
Values: Aggregations (Sum, Count, Avg, Distinct Count if Data Model is used).
Key features to mention
Grouping (dates into months/quarters/years, numbers into bins).
Calculated Fields/Items (classic pivots) and Measures (Power Pivot / Data Model).
Slicers & Timelines for interactive filtering.
Show Values As (e.g., % of Row Total, Running Total, Difference From).
Q17. Can you make a Pivot Table from multiple tables?
Yes—if there’s a relationship between the tables. You do this via the Data Model (Power Pivot under the hood).
Steps
Load each table to Excel as an Excel Table (Ctrl+T) with clean headers and a unique key in the lookup (dimension) tables.
Insert → PivotTable → From Table/Range → check Add this data to the Data Model.
Repeat for other tables or use Data → Get Data (Power Query) → load to Data Model.
Data → Relationships (or Power Pivot → Diagram View) to define one-to-many relationships (e.g., Sales[ProductID] → Products[ProductID]).
Insert the Pivot from the Data Model and use fields across tables.
Best practice
Use a star schema (fact table + dimensions).
Ensure lookup keys are unique on the “one” side.
Q18. How can we select all blank cells in Excel?
Go To Special
Select the dataset.
Press F5 (Go To) → Special… → choose Blanks → OK.
Excel selects only truly empty cells (not zeros, not formulas returning “”).
If cells contain formulas that return empty text “”
Use a helper column with =LEN(A2)=0 or =A2=”” and filter TRUEs; or
Find & Select → Replace might not help; consider converting “” to actual blanks via formula or Power Query.
Common use
After selecting blanks, type a value (e.g., “N/A”) and press Ctrl+Enter to fill all selected blanks at once.
Q19. What are the most common questions to ask before creating a dashboard?
Who is the audience and what decisions will this drive? (Executive vs. operational users.)
Top 3–5 KPIs and their precise definitions. (Calculation rules, time windows, filters.)
Data sources & ownership. (Systems, refresh method, access/credentials.)
Update frequency & latency tolerance. (Real-time, hourly, daily, weekly.)
Granularity needed. (Daily vs. monthly; transaction vs. summary.)
Drill-down paths & filters. (By region, product, segment.)
Layout constraints. (Screen size, printing, color-blind friendly palette.)
Version/Tooling. (Excel version, Power Query/Power Pivot availability, add-ins.)
Data quality & caveats to display. (Known gaps, exclusions.)
Security. (Row-level restrictions, masking sensitive fields.)
Interview bonus
Explain you’ll prototype quickly, test with stakeholders, then iterate based on feedback.
Q20. What is a Print Area and how can you set it?
Print Area is the range that Excel prints by default for that sheet.
Set
Select the range to print.
Page Layout → Print Area → Set Print Area.
Add to an existing print area
Page Layout → Print Area → Add to Print Area (creates discontiguous areas, each prints on a new page).
Clear
Page Layout → Print Area → Clear Print Area.
Related settings
Print Titles (repeat header rows/columns on each page).
Page Break Preview to adjust page fits.
Fit to One Page: Page Layout → Size/Scaling (careful with unreadable fonts).
Q21. What steps can you take to handle slow Excel workbooks?
Quick wins
Manual calculation when modeling: Formulas → Calculation Options → Manual; press F9 to calculate.
Save as .XLSB (binary) to reduce file size and open/save time.
Minimize volatile functions: OFFSET, INDIRECT, TODAY, NOW, RAND, RANDBETWEEN. Replace with non-volatile alternatives (INDEX instead of OFFSET, helper columns, static dates).
Avoid array-heavy formulas across whole columns. Limit ranges (e.g., A2:A50000 instead of A:A in SUMPRODUCT).
Replace complex formulas with helper columns (precompute once, aggregate later).
Convert heavy formulas to values where calculations are final (Paste Special → Values).
Use Excel Tables & Structured References—they auto-size and are efficient with XLOOKUP, SUMIFS, COUNTIFS.
Reduce conditional formatting rules and ranges; consolidate where possible.
Limit cross-workbook links and 3-D references.
Pictures, shapes, and styles: remove unneeded objects; clean duplicate styles.
Pivot caches: reuse the same cache for similar pivots; disable “Defer Layout Update” while designing.
Power Query / Power Pivot tips
Push heavy transformations into Power Query (query folding to the source where possible).
Use the Data Model (xVelocity) for large analytics instead of worksheet formulas—measures are faster and memory-efficient.
Hardware & options
64-bit Excel for very large models.
Increase RAM; enable multi-threaded calculation (File → Options → Advanced → Formulas).
Diagnostics
Formulas → Evaluate Formula to identify slow constructs.
Check Workbook Statistics; and use Inquire add-in (if available) to spot bloats.
Q22. Can you sort multiple columns at one time?
Yes—use multi-level sorting via the Sort dialog.
Steps
Select your full dataset (include headers).
Data → Sort.
Sort by: choose primary column (e.g., Region), order (A→Z).
Add Level: choose secondary column (e.g., Customer), order (A→Z).
Add Level again for tertiary sort (e.g., Order Date, Oldest to Newest).
Use Options… to enable case-sensitive sort or set Sort left to right for row-wise sorting.
Notes
Excel sorts are not stable by default—so always define all necessary levels.
Use Custom Lists for business-specific orders (e.g., High, Medium, Low or days of week).
Tiny practice dataset you can mention in interviews
Imagine a small Income Walk:
Starting Revenue: 1,000
COGS: −400
Marketing: −120
R&D: −80
Other Income: +30
Taxes: −90
Net Income: 340
This maps perfectly to a waterfall chart and shows how you’d narrate it.
If you want, I can spin up a tiny sample Excel file for you with:
A dataset,
A pivot built from two related tables,
A waterfall chart,
And a few conditional formatting rules—so you can practice and demo.
SQL:
1. What is the difference between INNER JOIN and LEFT JOIN in SQL?
INNER JOIN
Returns only rows that have matching values in both tables.
If a record in the left table does not have a match in the right table, it will not appear.
Best used when you need only the intersection of data.
Example: Get only customers who placed orders.
SELECT A.customer_id, B.order_id FROM Customers A INNER JOIN Orders B ON A.customer_id = B.customer_id;
LEFT JOIN
Returns all rows from the left table, plus the matching rows from the right.
If no match is found, NULLs are returned for right-side columns.
Best used when you don’t want to lose unmatched data from the left table.
Example: Get all customers, even if they didn’t order.
SELECT A.customer_id, B.order_id FROM Customers A LEFT JOIN Orders B ON A.customer_id = B.customer_id;
2. How do you handle missing data in a dataset?
Options:
Remove missing values: Use dropna() in Python Pandas. Suitable when missing data is small (<5%).
Imputation: Replace with mean, median, or mode. Works well for numerical/categorical features.
Predictive imputation: Use regression, KNN imputer, or ML models to estimate missing values.
Domain-specific handling: For time series, forward/backward fill may work.
Decision Rule:
If <5% missing → drop.
If important feature → impute.
If large portion missing → consider removing feature.
3. What are the key steps in a data analysis process?
- Define the problem – What business question are we answering?
- Collect data – From databases, APIs, Excel, or surveys.
- Clean & preprocess data – Handle missing values, duplicates, outliers.
- Analyze data – Using statistics, SQL, Python (Pandas, NumPy).
- Visualize results – Power BI, Tableau, Matplotlib, Seaborn.
- Interpret insights – Provide actionable recommendations for business.
4. Difference between OLAP and OLTP
Feature OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Purpose Daily operations (insert/update/delete) Business intelligence, reporting
Data Structure Highly normalized Denormalized
Speed Fast writes Fast reads
Example ATM, Banking apps, e-commerce Sales dashboards, Data Warehouses
5. Explain the use of CALCULATE() in Power BI
Purpose: Changes the filter context of a measure.
Example: Calculate sales only for 2023.
Total2023Sales = CALCULATE(SUM(Sales[Amount]), Sales[Year] = 2023)
Useful for:
Year-to-date, quarter-to-date calculations.
Applying multiple conditions dynamically.
6. How do you identify outliers in data?
Statistical methods:
IQR Method: Outliers lie outside [Q1 – 1.5*IQR, Q3 + 1.5*IQR].
Z-score: Points with |z| > 3 are extreme.
Visualization:
Boxplot → shows whiskers and outliers.
Scatter plot → reveals unusual points.
Why important?
Outliers can skew results, so we either remove or transform them.
7. What is normalization in SQL databases?
Definition: Organizing data into tables to remove redundancy and dependency.
Forms:
1NF: Atomic (no repeating groups, no multiple values in a cell).
2NF: No partial dependency (every non-key depends on full primary key).
3NF: No transitive dependency (non-key fields depend only on the primary key).
Benefit: Reduces redundancy, improves data integrity.
8. Difference between WHERE and HAVING in SQL
WHERE: Filters rows before aggregation.
HAVING: Filters after aggregation.
-- Example SELECT dept, COUNT(*) FROM Employees GROUP BY dept HAVING COUNT(*) > 5; -- Only departments with more than 5 employees
9. What are measures and dimensions in Power BI?
Dimension: Descriptive attributes (Product, Region, Customer).
Measure: Numeric calculation (Sales, Revenue, Profit).
Example:
Dimension → “Region = North”
Measure → “Total Sales = $1M”
10. Scenario: Sales dropped in a region, how will you analyze it?
Compare Year-on-Year (YoY) or Month-on-Month (MoM) trends.
Filter by region, analyze product mix & quantities sold.
Check customer churn, pricing strategy, promotions.
Investigate external factors: competition, supply chain issues, seasonal effects.
Present insights with visuals → e.g., line charts for trend, pie charts for product mix.
11. SQL query to get the second highest salary
SELECT MAX(salary) AS Second_Highest FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
Alternative (using RANK):
SELECT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) t WHERE rnk = 2;
12. How do you optimize a large dataset in Power BI?
Remove unused columns.
Use Star Schema (fact + dimension).
Prefer DAX over M queries for calculations.
Pre-aggregate at SQL/query level.
Disable auto-date/time option.
Use incremental refresh for large datasets.
13. What is data wrangling?
Process of cleaning, transforming, and mapping raw data into a usable format.
Includes:
Handling missing values.
Removing duplicates.
Normalizing formats (date, currency).
Creating new derived columns.
Tools: Power Query, Python (Pandas), SQL.
14. What is the use of GROUP BY in SQL?
Used to group rows based on a column and apply aggregation.
Example:
SELECT department, COUNT(*) FROM employees GROUP BY department;
15. Explain a project where you used data to drive business decisions.
(Customize with your real experience, here’s a model answer)
I built an e-commerce sales dashboard using Power BI. I analyzed top-selling products, profit margins, and customer segments. I noticed high returns from one region. After root cause analysis, we found a logistics issue. Fixing it reduced return rates by 18% and improved customer satisfaction.
16. SQL query – Running total of sales by product category
SELECT sale_id, product_category, sale_date, sale_amount, SUM(sale_amount) OVER ( PARTITION BY product_category ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM SalesData;
Use Case: Show cumulative sales trend over time per category.
17. INNER JOIN vs LEFT JOIN vs RIGHT JOIN vs FULL OUTER JOIN
INNER JOIN → Only matches (customers with orders).
LEFT JOIN → All left + matches from right (all customers, even without orders).
RIGHT JOIN → All right + matches from left (all employees, even without tasks).
FULL OUTER JOIN → Everything (all customers + suppliers, matched and unmatched).
Scenario table:
Join Type Use Case
INNER JOIN Customers who placed orders
LEFT JOIN All customers (even if no orders)
RIGHT JOIN All employees (even if no tasks)
FULL OUTER JOIN Combine all customers & suppliers
18. When would you use a CTE instead of subquery/temp table?
CTE (Common Table Expression): Temporary result set defined using WITH.
When to use:
Improve readability (multi-step queries).
Recursive queries (org hierarchy, tree structure).
Avoid repeating same subquery logic.
When temp tables are overkill.
Example:
WITH FilteredSales AS ( SELECT product_id, SUM(sale_amount) AS total_sales FROM Sales GROUP BY product_id ) SELECT product_id FROM FilteredSales WHERE total_sales > 1000;
Recursive Example: Employee hierarchy.
19. SQL query to pivot rows → columns (without PIVOT)
SELECT Region, MAX(CASE WHEN Quarter = 'Q1' THEN Revenue END) AS Q1_Revenue, MAX(CASE WHEN Quarter = 'Q2' THEN Revenue END) AS Q2_Revenue FROM Sales GROUP BY Region;
Why not PIVOT?
Not supported in MySQL/SQLite.
CASE WHEN is universal + flexible.
20. Monthly Revenue per Product Category
Task: Calculate the total revenue for each product category on a monthly basis.
Answer:
SELECT TO_CHAR(s.sale_date, 'YYYY-MM') AS sales_month, -- Use FORMAT() for SQL Server, strftime() for SQLite p.product_category, SUM(s.revenue_usd) AS total_revenue FROM Sales s JOIN Products p ON s.product_id = p.product_id GROUP BY sales_month, p.product_category ORDER BY sales_month, p.product_category;
Explanation: This query joins the Sales
and Products
tables to link each sale to its product category. It then groups the results by the year-month of the sale date and the product category, summing up the revenue for each group. The ORDER BY
clause ensures the output is sorted chronologically and by category.
21. Top 3 Most Profitable Products in Each Category
Task: For each category, find the top 3 products by total revenue, including ties.
Answer:
WITH ProductRevenue AS ( SELECT p.product_category, p.product_name, SUM(s.revenue_usd) AS total_revenue FROM Sales s JOIN Products p ON s.product_id = p.product_id GROUP BY p.product_category, p.product_name ) SELECT product_category, product_name, total_revenue, DENSE_RANK() OVER ( PARTITION BY product_category ORDER BY total_revenue DESC ) AS rank_in_category FROM ProductRevenue QUALIFY DENSE_RANK() OVER (PARTITION BY product_category ORDER BY total_revenue DESC) <= 3;
Explanation:
-
CTE (Common Table Expression): The
ProductRevenue
CTE calculates the total revenue for each product. -
Main Query: The main query uses the
DENSE_RANK()
window function to rank products within their category based on total revenue (descending).DENSE_RANK()
is used to handle ties; if two products have the same revenue, they get the same rank, and the next rank is not skipped. -
QUALIFY Clause: This filters the results to only include rows where the rank is 3 or lower. (Note:
QUALIFY
is used in databases like Snowflake and BigQuery. In SQL Server/PostgreSQL, you would use a subquery or another CTE to filter the ranked results).
22. Average Downtime by Maintenance Type and Asset Over Last Year
Task: Calculate the average downtime for each maintenance type and asset, only for assets with at least 5 logs in the last year.
Answer:
SELECT asset_id, maintenance_type, AVG(downtime_hours) AS average_downtime_hours FROM Maintenance_Logs WHERE log_date >= CURRENT_DATE - INTERVAL '1 year' -- Adjust for your SQL dialect GROUP BY asset_id, maintenance_type HAVING COUNT(log_id) >= 5 ORDER BY asset_id, maintenance_type;
Explanation: The WHERE
clause filters logs from the last 365 days. The GROUP BY
calculates the average downtime for each asset and maintenance type combination. The HAVING
clause is crucial as it filters these groups to only include those with a count of 5 or more records, ensuring the average is statistically significant.
23. Products with No Sales in the Last 6 Months
Task: List all products that have had zero sales in the last 6 months.
Answer:
SELECT p.product_name, p.product_category FROM Products p WHERE p.product_id NOT IN ( SELECT DISTINCT s.product_id FROM Sales s WHERE s.sale_date >= CURRENT_DATE - INTERVAL '6 months' );
Explanation: This query uses a subquery to find all distinct product_id
values that have sales in the last 6 months. The main query then selects all products whose product_id
is not in that list, effectively identifying the stagnant products.
24. Cumulative Revenue by Department Over Time
Task: Calculate the running total of revenue for each department over time.
Answer:
SELECT s.sale_date, CASE WHEN p.product_category LIKE '%Automation%' OR p.product_category LIKE '%Software%' THEN 'Digital Industries' WHEN p.product_category LIKE '%Medical%' THEN 'Healthineers' WHEN p.product_category LIKE '%Rail%' THEN 'Mobility' ELSE 'Other' END AS department, SUM(s.revenue_usd) AS daily_revenue, SUM(SUM(s.revenue_usd)) OVER ( PARTITION BY CASE WHEN p.product_category LIKE '%Automation%' OR p.product_category LIKE '%Software%' THEN 'Digital Industries' WHEN p.product_category LIKE '%Medical%' THEN 'Healthineers' WHEN p.product_category LIKE '%Rail%' THEN 'Mobility' ELSE 'Other' END ORDER BY s.sale_date ) AS cumulative_revenue FROM Sales s JOIN Products p ON s.product_id = p.product_id GROUP BY s.sale_date, department ORDER BY department, s.sale_date;
Explanation:
-
CASE Statement: Maps the
product_category
to a Siemens department (e.g., ‘Medical Imaging’ -> ‘Healthineers’). -
Daily Revenue:
SUM(s.revenue_usd)
grouped bysale_date
and the deriveddepartment
gives the daily revenue per department. -
Cumulative Revenue: The window function
SUM(SUM(...)) OVER (PARTITION BY department ORDER BY sale_date)
calculates a running total. It partitions the data by the derived department and orders it by date, so the sum accumulates over time within each department.
25. Employees Who Haven’t Logged Any Maintenance in the Last Quarter
Task: Find inactive employees in specific departments based on maintenance logs.
Answer:
SELECT e.employee_name, e.department FROM Employees e WHERE e.hire_date < CURRENT_DATE - INTERVAL '3 months' AND e.department IN ('Mobility', 'Smart Infrastructure') AND e.employee_id NOT IN ( SELECT DISTINCT m.technician_id FROM Maintenance_Logs m WHERE m.log_date >= CURRENT_DATE - INTERVAL '3 months' );
Explanation: The subquery finds all technician_id
values that have logged maintenance in the last quarter. The main query selects employees from the relevant departments, hired before the last quarter, whose employee_id
is not found in that subquery list.
26. Average Manufacturing Cost for Products with Revenue Above Average
Task: Find the avg. manufacturing cost for products whose total revenue is above the overall average product revenue.
Answer:
WITH ProductTotals AS ( SELECT product_id, SUM(revenue_usd) AS total_revenue FROM Sales GROUP BY product_id ) SELECT AVG(p.manufacturing_cost_usd) AS average_cost_of_high_revenue_products FROM Products p JOIN ProductTotals pt ON p.product_id = pt.product_id WHERE pt.total_revenue > (SELECT AVG(total_revenue) FROM ProductTotals);
Explanation:
-
CTE (
ProductTotals
): Calculates the total revenue for each product. -
Subquery in WHERE:
(SELECT AVG(total_revenue) FROM ProductTotals)
calculates the average of all individual product revenues. -
Main Query: Joins
Products
with the CTE and filters for products where the total revenue is greater than the overall average calculated by the subquery. Finally, it takes the average of themanufacturing_cost_usd
for these high-revenue products.
27. Percentage of Total Revenue by Customer Segment for Each Month
Task: For each month, show what percentage of total revenue each customer segment contributed.
Answer:
SELECT TO_CHAR(s.sale_date, 'YYYY-MM') AS sales_month, s.customer_segment, SUM(s.revenue_usd) AS segment_revenue, (SUM(s.revenue_usd) * 100.0) / SUM(SUM(s.revenue_usd)) OVER (PARTITION BY TO_CHAR(s.sale_date, 'YYYY-MM')) AS segment_revenue_percentage FROM Sales s GROUP BY sales_month, s.customer_segment ORDER BY sales_month, s.customer_segment;
Explanation: The key is the window function in the percentage calculation. SUM(SUM(s.revenue_usd)) OVER (PARTITION BY sales_month)
calculates the total revenue for each month. Dividing each segment’s monthly revenue by this total and multiplying by 100 gives the percentage contribution of each segment for that month.
28. Identify Consecutive Days of Downtime for Assets
Task: Find assets that had downtime (>0 hours) for 3 or more consecutive days.
Answer:
WITH DowntimeData AS ( SELECT asset_id, log_date, downtime_hours, LAG(log_date, 1) OVER (PARTITION BY asset_id ORDER BY log_date) AS prev_date, LAG(log_date, 2) OVER (PARTITION BY asset_id ORDER BY log_date) AS prev_date_2 FROM Maintenance_Logs WHERE downtime_hours > 0 ) SELECT DISTINCT asset_id, prev_date_2 AS consecutive_downtime_start_date FROM DowntimeData WHERE log_date - prev_date = 1 AND prev_date - prev_date_2 = 1;
Explanation:
-
CTE (
DowntimeData
): Uses theLAG()
function to look back at the previous two log dates for the same asset, but only for records where there was actual downtime. -
Main Query: Filters the results to find rows where the current log date is exactly one day after the previous date
(log_date - prev_date = 1)
, and the previous date is exactly one day after the one before that(prev_date - prev_date_2 = 1)
. This identifies a sequence of three consecutive days. The start of this sequence isprev_date_2
.
29. Department with the Highest Average Salary in Last 5 Years
Task: Find the department with the highest average salary for employees hired in the last 5 years.
Answer:
SELECT department, AVG(salary_usd) AS average_salary_last_5_years FROM Employees WHERE hire_date >= CURRENT_DATE - INTERVAL '5 years' GROUP BY department ORDER BY average_salary_last_5_years DESC LIMIT 1; -- Use TOP 1 for SQL Server, FETCH FIRST 1 ROW ONLY for Oracle
Explanation: The query filters for recent hires, groups them by department, and calculates the average salary for each group. Ordering the results by this average in descending order and taking the first row gives the department with the highest average salary.
30. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
These functions are used to assign a sequential integer to rows within a partition of a result set. The difference lies in how they handle ties (rows with identical values in the ORDER BY
clause).
Function | Behavior with Ties | Sequence After a Tie | Example Output for ORDER BY Salary DESC |
---|---|---|---|
ROW_NUMBER() |
Ignores ties. Every row gets a unique number, even if the values are the same. The assignment is arbitrary for tied rows. | Continuous, unique sequence. | 1, 2, 3, 4, 5 |
RANK() |
Recognizes ties. Tied rows get the same rank. | Skips subsequent ranks. The number of skipped ranks depends on the number of ties. | 1, 2, 2, 4, 5 |
DENSE_RANK() |
Recognizes ties. Tied rows get the same rank. | Does not skip ranks. The next rank after a tie is the next consecutive number. | 1, 2, 2, 3, 4 |
Your Example Explained:
Given an employees
table with these salaries: 100, 90, 90, 80, 70
Your SQL query would produce this result:
name | salary | rank | dense_rank | row_num |
---|---|---|---|---|
Alice | 100 | 1 | 1 | 1 |
Bob | 90 | 2 | 2 | 2 |
Charlie | 90 | 2 | 2 | 3 |
David | 80 | 4 | 3 | 4 |
Eve | 70 | 5 | 4 | 5 |
Why this happens:
-
RANK()
: Bob and Charlie are tied for 2nd place. Since two people are in 2nd place,RANK()
skips the next rank (3) and places David at 4. -
DENSE_RANK()
: Bob and Charlie are tied for 2nd place.DENSE_RANK()
then assigns the next consecutive number (3) to David. -
ROW_NUMBER()
: Simply assigns a unique number to each row based on its position in the ordered set. The order for the tied rows (Bob and Charlie) is non-deterministic unless specified further.
Key Takeaway:
-
Use
ROW_NUMBER()
to get a unique identifier for each row. -
Use
RANK()
if you want to know “this person is in the Nth position,” including ties, and are okay with gaps. -
Use
DENSE_RANK()
if you want to know “this person is in the Nth group or tier,” without any gaps in the ranking sequence.
POWER BI:
1. How would you optimize a complex Power BI report with multiple data sources and large datasets?
Goal: Improve report load time and ensure smooth user experience.
Best Practices:
Import Mode vs DirectQuery
Import → loads data into memory (faster).
DirectQuery → real-time but slower (use only when live data required).
Reduce Data Volume
Remove unused columns & rows at Power Query stage.
Pre-aggregate data at SQL/stored procedure level.
Use Star Schema
Fact tables (sales, transactions).
Dimension tables (customer, product).
Avoid snowflake models → more joins → slower queries.
Optimize DAX
Use variables (VAR) to avoid repeated calculations.
Replace FILTER(ALL(…)) with REMOVEFILTERS().
Turn off Auto Date/Time → saves hidden tables.
Use Aggregations → summary tables for faster queries.
Reduce visuals → each visual fires a query.
Performance Analyzer Tool → check slow visuals/measures.
Source-level optimization → push transformations to SQL views/SPs.
Example: If you had a 10M row sales table, you’d:
Create aggregated daily sales at SQL level.
Import summary, not transactions.
Use star schema with dimensions.
2. DAX measure for rolling 12-month average in Power BI
Concept: Rolling averages smooth trends (aka TTM – Trailing Twelve Months).
Step 1: Rolling 12M Sales
Rolling 12M Sales = CALCULATE( [Total Sales], DATESINPERIOD( 'Date'[Date], MAX('Date'[Date]), -12, MONTH ) )
MAX(‘Date'[Date]) → last date in context.
DATESINPERIOD → creates a moving 12-month window.
Step 2: Average
Rolling 12M Average = DIVIDE([Rolling 12M Sales], 12)
Fiscal Year Adjustment:
If fiscal year starts in April, add:
FiscalMonthNumber = MOD(MONTH('Date'[Date]) - 4 + 12, 12) + 1
Use Case: Analyzing yearly sales trend for April–March fiscal year.
3. How do you implement Row-Level Security (RLS) in Power BI?
Definition: Restricts what data users can see.
Steps:
At dataset level (recommended)
Create Roles → Sales[Region] = “East”.
Or dynamic RLS:
Sales[Region] = LOOKUPVALUE(EmployeeRegion[Region], EmployeeRegion[Email], USERPRINCIPALNAME())
Publish dataset → Assign users to roles in Power BI Service.
At report level
Not directly supported. Workarounds:
Create separate reports with filtered datasets.
Use shared dataset with RLS enforced.
Scenario:
Sales Manager (East) logs in → sees only East region sales.
No need for multiple reports.
4. How do you handle Slowly Changing Dimensions (SCD) in SQL for Power BI?
Definition: Dimension attributes change slowly (e.g., customer address).
Types:
Type 1: Overwrite (no history).
MERGE INTO DimCustomer AS Target USING StagingCustomer AS Source ON Target.CustomerID = Source.CustomerID WHEN MATCHED THEN UPDATE SET Target.Address = Source.Address;
Type 2: Add new row (history kept).
UPDATE DimCustomer SET EndDate = GETDATE() WHERE CustomerID = @CustomerID AND EndDate IS NULL; INSERT INTO DimCustomer (CustomerID, Name, Address, StartDate, EndDate) VALUES (@CustomerID, @NewName, @NewAddress, GETDATE(), NULL);
Type 3: Add new columns (limited history).
In Power BI: Join fact table to active record (EndDate IS NULL) or historical snapshot with date filters.
5. How to implement Incremental Refresh in Power BI?
Why: Avoid refreshing 10 years daily → refresh only recent data.
Steps:
Create parameters: RangeStart, RangeEnd.
Use them in Power Query:
= Table.SelectRows(Source, each [SaleDate] >= RangeStart and [SaleDate] < RangeEnd)
Define policy in Power BI Service:
Store past 5 years.
Refresh last 1 day/week.
Use Case: Sales dataset from 2010 → refresh only yesterday’s data daily.
6. How to create custom visuals in Power BI using R/Python?
Steps:
Enable R/Python scripting.
Insert R/Python visual.
Write code (example in Python):
import seaborn as sns import matplotlib.pyplot as plt sns.boxplot(x='Category', y='Sales', data=dataset) plt.show()
Considerations:
Limited to 150K rows.
Slower than native visuals.
Non-interactive with slicers.
Security (code execution).
Use Case: Heatmap correlation of sales by region.
Q7. What are the components of Power BI?
Answer:
Power BI has five main components which together provide a complete BI solution:
-
Power Pivot – Fetches, cleans, and loads data into the model.
-
Power Query – Performs transformations (filter, merge, split, reshape).
-
Power Q&A – Lets users query data in natural language (English).
-
Power View – For building interactive reports and visuals.
-
Power Map – For geographic and 3D spatial data visualization.
Together, these components enable data preparation → modeling → visualization → sharing.
Q8. How is Power BI different from Tableau?
Answer:
Both are BI tools, but differ in pricing, usability, and features:
-
Pricing: Power BI is cheaper (especially for larger teams) compared to Tableau.
-
Ease of Use: Power BI is simpler for Microsoft users, Tableau offers advanced customization.
-
Data Connectivity: Tableau has slightly more robust data prep and connection features.
-
Visualization: Tableau is more flexible; Power BI provides easier-to-use visuals.
-
Integration: Power BI integrates seamlessly with Excel, Office 365, SharePoint, Teams.
If cost and Microsoft integration matter, choose Power BI. If advanced analytics and customization matter, choose Tableau.
Q9. How is Power BI different from other BI tools?
Answer:
-
Power BI allows connecting to multiple data sources without coding.
-
Deep integration with Microsoft tools (Office 365, Bing, Azure).
-
Free desktop version allows:
-
Analysis of datasets up to 1GB.
-
10,000 rows/hour of streaming data.
-
-
Provides natural language queries through Power Q&A.
-
Compared to tools like QlikView/Tableau, Power BI is more cost-effective and beginner-friendly.
Q10. What is DAX Function?
Answer:
DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and SSAS.
-
It is a functional language – every calculation is wrapped in a function.
-
Supported data types: Numeric, Boolean, Date/Time, String, Decimal.
-
Example:
-
Use Cases: Create measures (KPIs), calculated columns, aggregations.
Q11. What are the types of filters available in Power BI Reports?
Answer:
Power BI supports three levels of filters:
-
Visual-level filter – Applies only to one chart/visual.
-
Page-level filter – Applies to all visuals on a page.
-
Report-level filter – Applies to the entire report (all pages).
This layered filtering gives flexibility in scope and granularity.
Q12. What is the CALCULATE function in DAX?
Answer:
The CALCULATE()
function modifies the filter context of an expression.
Syntax:
-
Expression: The calculation to evaluate.
-
Filter(s): Boolean or table expressions to apply.
Example:
Here, only rows where Country = India are considered.
Q13. What are the different challenges faced by a Power BI developer?
Answer:
-
No strong data cleansing features compared to ETL tools.
-
Struggles with large datasets (performance drops after ~20–30k rows in some cases).
-
Complex DAX can impact performance.
-
Bulky UI for beginners.
-
Governance & security (especially in large enterprises).
Q14. How can you do automation in Power BI?
Answer:
Automation is done via Power Automate (Microsoft Flow).
-
Cloud-based automation tool.
-
Supports 200+ connectors (SQL, SharePoint, Outlook, etc.).
-
Automates:
-
Scheduled refresh.
-
Notifications (e.g., email when sales fall below threshold).
-
Data movement between apps.
-
Example: Auto-send a sales dashboard PDF to managers every Monday morning.
Q15. Can you explain the bookmark function of Power BI?
Answer:
Bookmarks capture the current state of a report (filters, slicers, visuals).
-
Use Cases:
-
Save different views (e.g., Sales by Region vs Sales by Product).
-
Create presentations by navigating through bookmarks.
-
Custom navigation for end-users.
-
Q16. Explain the major concept of Power BI.
Answer:
Power BI is a self-service, cloud-based BI tool for data visualization and sharing.
-
Connects to multiple sources.
-
Allows transformation & modeling.
-
Provides interactive dashboards & KPIs.
-
Accessible via desktop, web, and mobile.
-
Strong focus on collaboration & sharing.
Q17. What is Power BI Desktop?
Answer:
Power BI Desktop is a Windows application (free) used for building BI reports.
-
Features: Import, clean, model, visualize data.
-
Can publish reports to Power BI Service.
-
Works offline (unlike Power BI Service which is cloud-based).
-
File format: .pbix
Q18. What is Get Data in Power BI?
Answer:
-
A feature to connect Power BI to various data sources.
-
Found in the Menu bar of Power BI Desktop.
-
Supports: Local files (Excel, CSV), cloud sources (Azure, Salesforce), web sources, databases (SQL, Oracle).
-
Enables both live connection and import mode.
Q19. What are the building blocks of Power BI?
Answer:
Five building blocks:
-
Visualizations – Graphical representations.
-
Datasets – Collections of data.
-
Reports – Multi-page visual collections.
-
Dashboards – One-page summary with pinned visuals.
-
Tiles – Single visualization on a dashboard.
Q20. What are visualizations in Power BI?
Answer:
-
Visual representation of data.
-
Power BI offers built-in and custom visuals.
-
Examples: Bar chart, Line chart, Pie, Waterfall, Funnel, Map, Gauge, Ribbon.
-
Custom visuals available via marketplace.
Q21. What are reports in Power BI?
Answer:
Reports = collection of visuals related to a dataset.
-
Can span multiple pages.
-
Help in deep analysis.
-
Can be shared via Power BI Service.
-
Example: A sales report may contain visuals for revenue trend, region-wise sales, and product KPIs.
Q22. What do you understand by dashboards in Power BI?
Answer:
Dashboards are single-page summaries made by pinning visuals from reports.
-
Always one page.
-
Used for quick decision-making.
-
Example: Executive dashboard showing Revenue, Expenses, Profit, and Growth Rate on one screen.
Q23. What are the KPIs in Power BI?
Answer:
-
Key Performance Indicators (KPIs) measure progress against a goal.
-
Always has:
-
Base Value (Measure)
-
Target Value
-
-
Graphically shows performance (green = good, red = bad).
-
Example: Actual Sales vs Target Sales.
Q24. How many versions of Power BI are available as of now? How are they different?
Answer:
Currently three editions are available:
-
Power BI Desktop (Free):
-
Authoring & development tool.
-
Connect, clean, visualize, and create reports.
-
File format: .pbix.
-
-
Power BI Pro (Paid, ~$9.99/user/month):
-
Enables collaboration, sharing, publishing.
-
Ad-hoc analysis.
-
-
Power BI Premium (Enterprise):
-
Advanced analytics, big data, AI support.
-
Dedicated cloud & on-prem capacity.
-
High-performance, scalable solution.
-
Q25. What are Slicers in Power BI?
Answer:
-
Slicers = visual filters placed on the report canvas.
-
Allow users to interactively filter data.
-
Example: A slicer for “Year” lets users toggle between 2021, 2022, etc.
-
Unlike background filters, slicers are visible & user-controlled.
Q26. Discuss about the Content Packs in Power BI.
Answer:
Content Packs are pre-built collections of:
-
Datasets
-
Reports
-
Dashboards
-
Excel Workbooks
Two types:
-
Service Provider Content Packs – e.g., Salesforce, Google Analytics.
-
User-Created Content Packs – Built & shared within organizations.
Purpose: Quick sharing & deployment of BI solutions.
Q27. What are Custom Visuals in Power BI?
Answer:
Custom Visuals are user-developed visuals built using Power BI SDK.
-
Created with JavaScript libraries (D3.js, jQuery, R, Python).
-
Examples: Sankey charts, Gantt charts, Bullet graphs.
-
Available on Microsoft AppSource marketplace.
-
Extend visualization beyond built-in charts.
Q28. What data sources can you connect to Power BI?
Answer:
Power BI supports wide connectivity:
-
Files: Excel (.xlsx), CSV (.csv), PBIX.
-
Content Packs: From providers (Salesforce, Google Analytics) or internal users.
-
Databases: SQL Server, Oracle, MySQL, PostgreSQL, Azure SQL DB.
-
Connectors: Azure Analysis Services, OData, APIs, web sources.
Q29. How are a Power BI dashboard and a report different from each other?
Answer:
Feature | Dashboard | Report |
---|---|---|
Pages | Always one page | Can have multiple pages |
Data Sources | Multiple datasets | One dataset |
Filtering | No filtering | Filtering & slicing allowed |
Alerts | Supports alerts | No alerts |
Dataset Access | Cannot view underlying data | Can view tables & fields |
Q30. What is Power Pivot?
Answer:
Power Pivot is an in-memory component in Power BI.
-
Stores compressed data for faster access.
-
Builds data models & relationships.
-
Creates calculated columns & measures.
-
Supports large datasets efficiently.
Q31. What is Power View?
Answer:
-
A visualization technology for interactive reports.
-
Available in Excel, SQL Server, SharePoint, Power BI.
-
Supports charts, graphs, maps, and slicers.
-
Useful for exploring and storytelling with data.
Q32. Define Power Query.
Answer:
Power Query is an ETL tool for shaping and transforming data.
-
Provides GUI to clean, merge, filter, split, pivot/unpivot.
-
Generates M language code in the background.
-
Allows combining multiple sources.
Q33. What is the major difference between a Filter and a Slicer?
Answer:
-
Filter: Applied in the background, not visible to end-users.
-
Slicer: Visible, interactive filter element placed on the report canvas.
Slicers are more user-friendly & dynamic.
Q34. What is the Power BI Service?
Answer:
Cloud-based platform for:
-
Publishing & sharing reports.
-
Collaboration via Workspaces.
-
Scheduled refresh.
-
Row-Level Security.
-
Access on web & mobile.
Q35. Share some key points about DAX usage in Power BI.
Answer:
-
Used for calculated columns, measures, tables.
-
Similar to Excel formulas but more powerful.
-
Works with row context & filter context.
-
Enables KPIs like ratios, YoY growth, moving averages.
Q36. Which table functions will you use to group data in Power BI?
Answer:
-
SUMMARIZE(): Groups data by one or more columns.
-
SUMMARIZECOLUMNS(): Optimized grouping function in Power BI.
Q37. Explain the concept of Power BI DAX.
Answer:
DAX has three fundamentals:
-
Syntax – Formula rules (e.g.,
= SUM(Sales[Amount])
). -
Context – Row Context (row by row), Filter Context (applied filters).
-
Functions – Predefined (SUM, AVERAGE, CALCULATE, IF, FILTER).
Q38. Explain some commonly used DAX functions.
Answer:
-
Date/Time: DATE, HOUR.
-
Time Intelligence: DATESYTD, PREVIOUSMONTH.
-
Logical: IF, AND, OR.
-
Math: DIVIDE, ROUND.
-
Statistical: COUNTROWS, AVERAGE.
-
Text: CONCATENATE, LEFT, FORMAT.
-
Table: FILTER, DISTINCT, VALUES.
Q39. What is a Summarize function in DAX?
Answer:
Used to aggregate and group data.
Syntax:
Example:
Q40. What are DAX Calculation types?
Answer:
Two main types:
-
Calculated Columns: Row-by-row calculations.
-
Example:
Profit = Sales[Revenue] - Sales[Cost]
.
-
-
Measures: Aggregations across entire dataset.
-
Example:
Total Sales = SUM(Sales[Revenue])
.
-
Q41. Explain Power BI Query Editor.
Answer:
-
Tool for data transformation before loading.
-
Functions: Remove columns, merge, pivot/unpivot, filter.
-
Does not change source data – only Power BI dataset.
Q42. Are Power View and Power Query the same?
Answer:
-
Power Query: Used for data preparation (ETL).
-
Power View: Used for visualization & reporting.
Q43. What is Parameter in Power BI?
Answer:
-
A dynamic value used in queries/reports.
-
Can be reused across multiple queries.
-
Example: Instead of hardcoding year = 2023, create a Year Parameter.
Q44. How to sort data in Power BI?
Answer:
-
Alphabetical / numeric sort.
-
Sort by Column (e.g., Month sorted by MonthNumber).
-
Custom DAX-based sort.
Q45. How to combine two columns in Power BI?
Answer:
Use DAX Calculated Column:
Q46. What is Power BI Data Gateway? How can we use it?
Answer:
Bridge between on-premises data and Power BI Service.
-
Types: Personal & Enterprise gateway.
-
Uses:
-
Secure data transfer.
-
Required for scheduled refresh.
-
-
Example: Refresh on-prem SQL Server data in cloud dashboard.
Q47. How is the Filter function used in Power BI?
Answer:
The FILTER()
DAX function restricts rows of a table.
Syntax:
Example:
Returns only sales where Country = India.
At Learnomate Technologies, we don’t just teach tools—we train you with real-world, hands-on knowledge that sticks. Our Data Analyst training program is designed to help you crack job interviews, build impactful projects, and grow confidently in your data-driven career.
Want to see how we teach? Hop over to our YouTube channel for bite-sized tutorials, student success stories, and data analysis techniques explained in simple English.
Ready to get certified and hired? Check out our Data Analyst course page for full curriculum details, placement assistance, and batch schedules.
Curious about who’s behind the scenes? I’m Ankush Thavali, founder of Learnomate and your guide for all things data, analytics, and visualization. Let’s connect on LinkedIn—I regularly share practical insights, job alerts, and learning tips to keep you ahead of the curve.
And hey, if this article got your curiosity going…
👉 Explore more on our blog, where we simplify complex topics like SQL, Python, Power BI, Excel, and advanced analytics.
Thanks for reading! Now it’s time to turn this knowledge into action. Happy learning, and see you in class or in the next blog!
Happy Analyzing!
ANKUSH