Skip to the content.

Project Management Basics

Click ★ if you like the project. Your contributions are heartily ♡ welcome.


Table of Contents


Questions

No. Questions
1 What are the scientific ways to do project estimation?
2 What is planning poker estimation technique?
3 What is Ballpark Figures estimate?
4 What are the tools used for requirements gathering?
5 Explain the concept of RAID in project management?
6 What are the techniques used to define the scope of a project?
7 Explain Ishikawa/Fishbone diagrams?
8 What is the process of calculating the three-point estimating method?
9 What is Work Breakdown Structure (WBS)?
10 What is the Pareto principle analysis?
11 What is Gherkin approach for writing user stories?
12 What are the roles and responsibilities of a Technical Lead?
13 What is the difference between Agile and Waterfall methodologies?
14 What is a Sprint Retrospective and how is it conducted?
15 What is the Critical Path Method (CPM) in project management?
16 What is Earned Value Management (EVM)?
17 What is a Project Charter and what does it contain?
18 What is the difference between a Product Owner and a Project Manager?
19 What is Change Management in project management?
20 What are DORA metrics and why do they matter?
21 Software Architect Interview Questions


Q. What are the scientific ways to do project estimation?

There are many different types of estimation techniques used in project management with various streams like Engineering, IT, Construction, Agriculture, Accounting, etc. A Project manager is often challenged to align mainly six project constraints - Scope, Time, Cost, Quality, Resources and Risk in order to accurately estimate the project. The common questions that come into the mind of a project manager at the start of the project are –

There are 3 major parts to project estimation mainly:-

While accurate estimates are the basis of sound project planning, there are many techniques used as project management best practices in estimation as - Analogous estimation, Parametric estimation, Delphi method, 3 Point Estimate, Expert Judgment, Published Data Estimates, Vendor Bid Analysis, Reserve Analysis, Bottom-Up Analysis, and Simulation. Usually, during the early stages of a project life cycle, the project requirements are feebly known and less information is available to estimate the project. The initial estimate is drawn merely by assumptions knowing the scope at a high level, this is known as ‘Ball-park estimates’, a term very often used by project managers.

Top-down estimate:
Once more detail is learned on the scope of the project, this technique is usually followed where high-level chunks at the feature or design level are estimated and are decomposed progressively into smaller chunks or work-packets as information is detailed.

Bottom-up estimate:
This technique is used when the requirements are known at a discrete level where the smaller workpieces are then aggregated to estimate the entire project. This is usually used when the information is only known in smaller pieces.

Analogous estimating:
This technique is used when there is a reference to a similar project executed and it is easy to correlate with other projects. Expert judgment and historical information of similar activities in a referenced project are gathered to arrive at an estimate of the project.

Parametric estimate:
This technique uses independent measurable variables from the project work. For example, the cost for construction of a building is calculated based on the smallest variable as the cost to build a square feet area, the effort required to build a work packet is calculated from the variable as lines of codes in a software development project. This technique gives more accuracy in project estimation.

Three-point estimating:
This technique uses a mathematical approach as the weighted average of an optimistic, most likely and pessimistic estimate of the work package. This is often known as the PERT (Program Evaluation and Review Technique).

What-if analysis:
This technique uses assumptions based on varying factors like scope, time, cost, resources, etc., to evaluate the possible outcomes of the project by doing impact analysis. In a usual scenario, the project estimate is done by conducting estimation workshops with the stakeholders of the project, senior team members who could give valuable inputs to the estimation exercise. The high-level scope is broken down into smaller work packages, components and activities, each work package is estimated by effort and resource needed to complete the work package. The project may be detailed into the smallest chunk that can be measured. The following activities are done during the workshop:

T-shirt Sizing (Modern Agile):
A relative estimation technique where work items are sized as XS, S, M, L, XL, or XXL instead of hours or story points. It is fast, collaborative, and avoids false precision. Widely used in SAFe (Scaled Agile Framework) PI Planning and early backlog grooming when detailed estimates are not yet feasible.

Monte Carlo Simulation (Advanced):
A probabilistic estimation technique that runs thousands of simulated project scenarios using historical velocity data and task variability to generate a probability distribution of completion dates and costs. Used in risk-heavy and large-scale projects for confidence-interval forecasting (e.g., “there is an 85% probability of delivery by Q3”).

AI-Assisted Estimation (Emerging Practice):
Modern teams leverage AI tools (GitHub Copilot, Jira AI, Azure DevOps Copilot) to assist in effort estimation by analyzing historical project data, comparing similar past user stories, and suggesting story point ranges. AI does not replace expert judgment but reduces anchoring bias and speeds up estimation workshops.

PMBOK 7th Edition Note: The latest PMI standard (2021) shifts from prescriptive processes to 12 principles and 8 performance domains, emphasizing adaptability and outcomes over rigid methodologies. Estimation is now treated as a continuous activity rather than a one-time planning event, aligned with hybrid and agile delivery models.

↥ back to top

Q. What is planning poker estimation technique?

Planning poker, also called Scrum poker, is a consensus-based, gamified technique for estimating, mostly used to estimate effort or relative size of development goals in software development. Planning Poker is an agile estimating and planning technique that is consensus based. To start a poker planning session, the product owner or customer reads an agile user story or describes a feature to the estimators.

Each estimator is holding a deck of Planning Poker cards with values like 0, 1, 2, 3, 5, 8, 13, 20, 40 and 100, which is the sequence we recommend. The values represent the number of story points, ideal days, or other units in which the team estimates.

The estimators discuss the feature, asking questions of the product owner as needed. When the feature has been fully discussed, each estimator privately selects one card to represent his or her estimate. All cards are then revealed at the same time.

If all estimators selected the same value, that becomes the estimate. If not, the estimators discuss their estimates. The high and low estimators should especially share their reasons. After further discussion, each estimator re-selects an estimate card, and all cards are again revealed at the same time.

The poker planning process is repeated until consensus is achieved or until the estimators decide that agile estimating and planning of a particular item needs to be deferred until additional information can be acquired.

Planning poker combines three methods of estimation −

Modern Enhancements to Planning Poker:

↥ back to top

Q. What is Ballpark Figures estimate?

A ballpark figure is a rough numerical estimate or approximation of the value of something that is otherwise unknown. Ballpark figures are commonly used by accountants, salespersons, and other professionals to estimate current or future results. A stockbroker could use a ballpark figure to estimate how much money a client might have at some point in the future, given a certain rate of growth. A salesperson could use a ballpark figure to estimate how long a product a customer was thinking about buying might be viable.

A ballpark figure is essentially a placeholder established for purposes of speculating what the amount or total of something might amount to so that the parties involved can move forward in whatever negotiation or planning is underway. As a concept, it has applications in business estimates, as well as in everyday life, depending on the circumstances.

Ballpark figures are estimates used to move a discussion or deal forward when the exact measurement of the size or amount of something cannot yet be determined.

Ballpark figures can be used for day-to-day purposes, such as estimating how much food and beverages might be needed for a barbecue or how many months it will likely take to pay off a new purchase.

Ballpark figures are also used everywhere in the business world, such as estimating how much it might cost to expand into a certain market, or how many years it might take for a company to be profitable or for sales to justify a large purchase. It can also be used to estimate public adoption of a concept, technology, or product, as in how many people are likely to buy a certain phone and how long it might take them to upgrade that phone, once purchased.

KEY TAKEAWAYS

Q. What are the tools used for requirements gathering?

Requirements gathering is the process of identifying and documenting the needs of stakeholders for a new or modified product or system. The following tools and techniques are commonly used:

↥ back to top

Q. Explain the concept of RAID in project management?

RAID is an acronym that stands for Risks, Assumptions, Issues, and Dependencies. It is a project management tool used to track and manage the key factors that can affect the successful delivery of a project.

A RAID log is a living document, maintained and reviewed regularly throughout the project lifecycle to ensure proactive management of these factors.

RAIDC — Extended Modern Variant:

Some organizations extend the RAID acronym to RAIDC by adding:

Modern RAID Tooling:

Tool Usage
Jira Risks and issues tracked as tickets with labels, priorities, and owners
Azure DevOps Work items and risk registers integrated with sprint boards
Confluence / SharePoint RAID log maintained as a shared, searchable document
Monday.com / Smartsheet Visual RAID dashboards with status tracking and notifications

RAID in Hybrid and Agile Projects:
In Agile teams, RAID items are surfaced during daily standups (issues), sprint retrospectives (risks and assumptions), and backlog refinement (dependencies). In SAFe, RAID is managed at the Program Increment (PI) level and tracked in the ART (Agile Release Train) risk board using a ROAM technique:

↥ back to top

Q. What are the techniques used to define the scope of a project?

Project scope definition involves clearly documenting the boundaries of a project — what is included and what is excluded. Common techniques include:

↥ back to top

Q. Explain Ishikawa/Fishbone diagrams?

An Ishikawa diagram, also called a Fishbone diagram or Cause-and-Effect diagram, is a visual tool used to systematically identify and analyze the root causes of a problem or defect. It was developed by Japanese quality control expert Kaoru Ishikawa.

The diagram resembles a fish skeleton:

Common cause categories (the 6 Ms used in manufacturing):

Category Description
Man Human factors — skills, training, fatigue
Machine Equipment, tools, technology
Method Processes, procedures, workflows
Material Raw materials, components, data
Measurement Metrics, calibration, data accuracy
Mother Nature (Environment) Environmental conditions, workspace

Steps to create a Fishbone diagram:

  1. Define and write the problem (effect) at the fish head.
  2. Identify the main cause categories and draw them as branches.
  3. Brainstorm potential causes for each category using the “5 Whys” technique.
  4. Analyze the diagram to identify the most likely root causes.
  5. Prioritize causes for further investigation and corrective action.

Benefits:

↥ back to top

Q. What is the process of calculating the three-point estimating method?

The three-point estimating method improves the accuracy of estimates by considering uncertainty and risk. Instead of a single estimate, three scenarios are defined for each task:

Estimate Symbol Description
Optimistic O Best-case scenario — everything goes perfectly
Most Likely M The realistic, most probable outcome
Pessimistic P Worst-case scenario — maximum problems occur

Two formulas are used:

1. Triangular Distribution (Simple Average):

\[E = \frac{O + M + P}{3}\]

2. Beta Distribution (PERT — Program Evaluation and Review Technique):

\[E = \frac{O + 4M + P}{6}\]

The PERT formula gives four times the weight to the most likely estimate, making it more accurate for project planning.

Standard Deviation (for PERT):

\[SD = \frac{P - O}{6}\]

Example: A software module has the following estimates:

Triangular estimate: \(E = \frac{4 + 6 + 14}{3} = 8 \text{ days}\)

PERT estimate: \(E = \frac{4 + (4 \times 6) + 14}{6} = \frac{42}{6} = 7 \text{ days}\)

Standard Deviation: \(SD = \frac{14 - 4}{6} \approx 1.67 \text{ days}\)

Benefits:

↥ back to top

Q. What is Work Breakdown Structure (WBS)?

A Work Breakdown Structure (WBS) is a hierarchical decomposition of the total scope of work required to complete a project. It organizes and defines the project's total work into smaller, manageable components called work packages.

Key characteristics:

WBS Structure Levels:

Level Description Example
Level 1 Project E-Commerce Website
Level 2 Major Deliverables Frontend, Backend, Database
Level 3 Sub-deliverables UI Design, API Development
Level 4 Work Packages Login Page, Product Listing API

Types of WBS:

Benefits of WBS:

WBS Dictionary: A companion document that provides detailed information about each WBS element including description, responsible party, schedule milestones, required resources, and acceptance criteria.

Agile WBS / Product Breakdown Structure (PBS):
In Agile and hybrid projects, the traditional WBS is complemented or replaced by:

Modern WBS Tooling:

Tool Capability
Microsoft Project Traditional WBS with Gantt chart and resource planning
Jira Hierarchical backlog (Epic → Story → Sub-task) as Agile WBS
Azure DevOps Work item hierarchy (Epic → Feature → User Story → Task)
Asana / Monday.com Visual WBS with timeline and dependency tracking
Miro / FigJam Collaborative WBS whiteboarding for remote teams
↥ back to top

Q. What is the Pareto principle analysis?

The Pareto Principle, also known as the 80/20 Rule, states that roughly 80% of effects come from 20% of causes. It was named after Italian economist Vilfredo Pareto, who observed that 80% of Italy's land was owned by 20% of the population.

In project and quality management, this principle is applied through Pareto Analysis — a technique used to identify and prioritize the most significant problems or causes to focus effort where it will have the greatest impact.

How Pareto Analysis works:

  1. Identify and list the problems or causes to be analyzed.
  2. Measure the frequency or impact of each problem (e.g., number of defects, cost, time lost).
  3. Sort in descending order from highest to lowest frequency/impact.
  4. Calculate cumulative percentages for each problem category.
  5. Draw a Pareto Chart — a bar chart combined with a line graph showing cumulative percentages.
  6. Identify the vital few — the 20% of causes responsible for 80% of the problems.
  7. Focus corrective action on those top causes first.

Example in Software Projects:

Bug Category Count Cumulative %
UI/UX Issues 45 45%
API Errors 30 75%
Database Issues 15 90%
Config Errors 7 97%
Other 3 100%

Fixing UI/UX and API issues (20% of categories) resolves 75% of all bugs.

Applications in project management:

↥ back to top

Q. What is Gherkin approach for writing user stories?

Gherkin is a plain-text, human-readable language used to write acceptance criteria and user stories in a structured format that both business stakeholders and developers can understand. It is the language used by Behavior-Driven Development (BDD) frameworks such as Cucumber, SpecFlow, and Behave.

Gherkin bridges the gap between business requirements and automated test specifications by expressing behavior in natural language using a fixed set of keywords.

Core Keywords:

Keyword Purpose
Feature High-level description of the feature being tested
Scenario A specific example or test case for the feature
Given The initial context or precondition
When The action or event that occurs
Then The expected outcome or result
And / But Chains multiple Given/When/Then steps
Background Common steps shared across all scenarios in a feature
Scenario Outline A template scenario run with multiple data sets
Examples Data table used with Scenario Outline

Basic Syntax Example:

Feature: User Login

  Scenario: Successful login with valid credentials
    Given the user is on the login page
    When the user enters a valid username "john@example.com"
    And the user enters a valid password "Secret@123"
    Then the user should be redirected to the dashboard
    And a welcome message "Hello, John" should be displayed

Scenario Outline with Examples (data-driven):

Feature: User Login Validation

  Scenario Outline: Login with different credentials
    Given the user is on the login page
    When the user enters username "<username>" and password "<password>"
    Then the login result should be "<result>"

  Examples:
    | username            | password    | result  |
    | john@example.com    | Secret@123  | success |
    | wrong@example.com   | Secret@123  | failure |
    | john@example.com    | wrongpass   | failure |

Background Example (shared precondition):

Feature: Shopping Cart

  Background:
    Given the user is logged in
    And the shopping cart is empty

  Scenario: Add item to cart
    When the user adds "Laptop" to the cart
    Then the cart should contain 1 item

  Scenario: Remove item from cart
    Given the user has "Laptop" in the cart
    When the user removes "Laptop" from the cart
    Then the cart should be empty

Benefits of Gherkin:

Gherkin vs Traditional User Stories:

Aspect Traditional User Story Gherkin Scenario
Format As a [role], I want [goal], so that [benefit] Given/When/Then
Audience Business + Dev Business + Dev + QA
Testable Not directly Directly executable
Tooling Jira, Trello Cucumber, SpecFlow, Behave

Modern Gherkin: The Rule Keyword (Gherkin 6+):
The Rule keyword was introduced to group related scenarios under a single business rule within a feature. This makes large feature files more organized and readable.

Feature: User Account Management

  Rule: Users must be verified before accessing premium features

    Scenario: Verified user accesses premium content
      Given the user has verified their email
      When the user navigates to the premium section
      Then access is granted

    Scenario: Unverified user attempts premium access
      Given the user has NOT verified their email
      When the user navigates to the premium section
      Then the user should see a verification prompt

Gherkin + AI-Assisted BDD (Emerging Practice):
Modern teams use AI tools to accelerate BDD adoption:

Gherkin in CI/CD Pipelines:
Gherkin scenarios are integrated into modern DevOps pipelines (GitHub Actions, Azure Pipelines, Jenkins) where Cucumber or SpecFlow runs acceptance tests automatically on every pull request, providing living documentation that always reflects the current system behavior.

↥ back to top

Q. What are the roles and responsibilities of a Technical Lead?

A Technical Lead (Tech Lead) is a senior engineering role responsible for guiding the technical direction of a team or project. Unlike a pure individual contributor, the Tech Lead balances hands-on coding with leadership, mentoring, and coordination responsibilities. The role sits at the intersection of engineering and management.

Core Responsibilities:

1. Technical Direction and Architecture

2. Code Quality and Standards

3. Technical Planning and Estimation

4. Mentoring and Team Development

5. Cross-functional Collaboration

6. Delivery and Execution

7. Security and Compliance

Tech Lead vs. Engineering Manager:

Dimension Technical Lead Engineering Manager
Primary focus Technical excellence + delivery People management + org health
Coding Active contributor (30–70%) Minimal or none
Reports to Engineering Manager or CTO VP Engineering or CTO
Hiring Contributes to interviews Owns hiring decisions
Performance reviews Input provider Owner
Career path Staff Engineer → Principal → Architect EM → Director → VP

Modern Tech Lead Skills (2024–2026):

↥ back to top

Q. What is the difference between Agile and Waterfall methodologies?

Agile and Waterfall are two fundamentally different approaches to software project delivery. Choosing between them — or combining them in a hybrid model — depends on the nature of the project, stakeholder needs, and the level of requirements certainty.

Waterfall:
A sequential, linear project management approach where each phase (Requirements → Design → Development → Testing → Deployment → Maintenance) must be completed before the next begins. Requirements are locked upfront and change is costly.

Agile:
An iterative, incremental delivery approach where work is broken into short cycles (sprints/iterations), enabling continuous feedback, adaptation, and delivery of working software frequently.

Comparison:

Dimension Waterfall Agile
Approach Sequential, phase-gated Iterative, incremental
Requirements Fixed upfront Evolving throughout
Delivery Single delivery at project end Frequent releases (every 1–4 weeks)
Customer involvement At start and end Continuous collaboration
Change tolerance Low — changes are costly High — embraces change
Documentation Heavy upfront documentation Lightweight, just-enough docs
Team structure Siloed (BA, Dev, QA in sequence) Cross-functional, self-organizing
Risk management Risks identified upfront Risks surfaced and addressed iteratively
Best suited for Fixed scope, regulated projects Complex, innovative, fast-changing products
Examples Construction, compliance systems SaaS products, mobile apps, AI systems

Hybrid (Water-Scrum-Fall):
Many enterprise organisations adopt a hybrid model — Waterfall governance (fixed budget, milestones, contracts) wrapped around Agile execution (sprints, backlog, daily standups). This is common in SAFe implementations.

When to choose Waterfall:

When to choose Agile:

↥ back to top

Q. What is a Sprint Retrospective and how is it conducted?

A Sprint Retrospective is a Scrum ceremony held at the end of each sprint where the team inspects how they worked together and identifies improvements for the next sprint. It is a core pillar of Agile's continuous improvement principle.

Scrum Guide Definition:
The Sprint Retrospective is an opportunity for the Scrum Team to inspect itself and create a plan for improvements to be enacted during the next Sprint. It is timeboxed to a maximum of 3 hours for a one-month sprint (proportionally shorter for shorter sprints).

Participants: Scrum Team (Developers + Scrum Master + Product Owner)

Standard Agenda (3 key questions):

  1. What went well? — Practices, tools, or behaviors to keep and reinforce
  2. What didn't go well? — Pain points, blockers, process failures to address
  3. What will we improve? — 1–3 concrete, actionable improvements for the next sprint

Popular Retrospective Formats:

Format Description
Start / Stop / Continue What should we start doing, stop doing, and continue doing?
4Ls Liked, Learned, Lacked, Longed For
Mad / Sad / Glad Emotional temperature check on the sprint
Sailboat Wind (what helped), Anchors (what slowed us), Rocks (risks ahead), Sun (the goal)
5 Whys Root cause analysis for recurring issues
Timeline Team maps key events of the sprint chronologically to spot patterns

Modern Remote Retrospective Tools:

Output: A short list of improvement actions (ideally 1–3) added as user stories or tasks to the next sprint backlog, with clear owners and acceptance criteria.

Common Anti-patterns to Avoid:

↥ back to top

Q. What is the Critical Path Method (CPM) in project management?

The Critical Path Method (CPM) is a project scheduling algorithm used to determine the longest sequence of dependent tasks that defines the minimum possible project duration. Tasks on the critical path have zero float (slack) — any delay in them directly delays the project completion date.

Key Concepts:

Term Definition
Activity A discrete task with a defined duration
Dependency Relationship between tasks (FS, SS, FF, SF)
Early Start (ES) Earliest a task can start
Early Finish (EF) ES + Duration
Late Start (LS) Latest a task can start without delaying the project
Late Finish (LF) LS + Duration
Float / Slack LF − EF (or LS − ES) — time a task can be delayed without affecting project end
Critical Path The sequence of tasks with zero float

Steps to calculate the Critical Path:

  1. List all activities and their durations.
  2. Identify dependencies between activities.
  3. Draw the network diagram (Activity on Node or Activity on Arrow).
  4. Forward pass — calculate Early Start and Early Finish for each task.
  5. Backward pass — calculate Late Start and Late Finish from the project end.
  6. Calculate float for each task: Float = LS − ES.
  7. Identify the critical path — all tasks with Float = 0.

Example Network:

Start → A(3d) → B(4d) → D(2d) → End   ← Critical Path (9 days)
              → C(6d) → D(2d)          (C has 0 float too if it joins critical path)

Benefits:

CPM vs. PERT:

Aspect CPM PERT
Duration Single deterministic estimate Three-point estimate (O, M, P)
Focus Schedule optimisation Schedule uncertainty/risk
Best for Repetitive, well-understood tasks R&D, novel, uncertain projects

Modern tooling: Microsoft Project, Primavera P6, Smartsheet, and ProjectLibre all compute the critical path automatically from task dependencies.

↥ back to top

Q. What is Earned Value Management (EVM)?

Earned Value Management (EVM) is a project performance measurement technique that integrates scope, schedule, and cost into a single integrated framework. It provides objective data on how much work has been completed relative to what was planned and what was spent.

Core EVM Metrics:

Metric Abbreviation Definition
Planned Value PV Budgeted cost of work scheduled to be done by a point in time
Earned Value EV Budgeted cost of work actually completed (% complete × Budget at Completion)
Actual Cost AC Actual money spent on the work completed so far
Budget at Completion BAC Total approved budget for the project

Key Variances:

\(\text{Schedule Variance (SV)} = EV - PV\) \(\text{Cost Variance (CV)} = EV - AC\)

Performance Indices:

\(\text{Schedule Performance Index (SPI)} = \frac{EV}{PV}\) \(\text{Cost Performance Index (CPI)} = \frac{EV}{AC}\)

Forecasting:

\(\text{Estimate at Completion (EAC)} = \frac{BAC}{CPI}\) \(\text{Estimate to Complete (ETC)} = EAC - AC\) \(\text{Variance at Completion (VAC)} = BAC - EAC\)

Example:

Benefits:

↥ back to top

Q. What is a Project Charter and what does it contain?

A Project Charter is a formal document that officially authorises the existence of a project and grants the Project Manager the authority to apply organisational resources to project activities. It is typically issued by a project sponsor (senior stakeholder) and serves as the project's constitution.

Key Contents of a Project Charter:

Section Description
Project Title Name and unique identifier of the project
Project Purpose / Business Case Why the project is being undertaken and the problem/opportunity it addresses
Objectives Specific, measurable goals (aligned to SMART criteria or OKRs)
Scope Summary High-level description of what is in and out of scope
Deliverables Key outputs the project will produce
Milestones Major schedule checkpoints and target dates
Budget Summary High-level cost estimate and approved budget
Stakeholders Key stakeholders, their roles, and level of influence/interest
Project Team Project Manager, Tech Lead, and core team members
Assumptions and Constraints Factors assumed to be true and fixed limitations
Risks (High Level) Initial identification of major risks
Success Criteria How project success will be measured
Approval / Sign-off Sponsor signature authorising the project

Why the Project Charter matters:

Project Charter vs. Project Plan:

Aspect Project Charter Project Plan
Purpose Authorise and initiate Plan and execute
Detail level High-level Detailed
Created by Sponsor + PM PM + Team
Timing Before project starts After charter approval
Length 1–3 pages Multi-document
↥ back to top

Q. What is the difference between a Product Owner and a Project Manager?

The Product Owner (PO) and Project Manager (PM) are distinct roles that often coexist in organisations running Agile or hybrid delivery models. They have different focuses, accountabilities, and skill sets.

Product Owner:
A Scrum role (defined in the Scrum Guide) responsible for maximising the value of the product by managing and prioritising the product backlog. The PO represents the voice of the customer and business stakeholders.

Project Manager:
A role responsible for delivering the project on time, within scope and budget, managing risks, stakeholders, resources, and governance. The PM is outcome-focused at the project level, not the product level.

Comparison:

Dimension Product Owner Project Manager
Primary focus Product value and backlog Project delivery and governance
Accountability What gets built and why When, how much, who builds it
Framework Scrum / Agile PMBOK / PRINCE2 / SAFe / Hybrid
Time horizon Ongoing (product lifecycle) Fixed (project start to close)
Stakeholder management Business + customers All stakeholders (exec, vendors, team)
Success metric Business outcomes, user adoption On-time, on-budget, in-scope delivery
Backlog ownership Yes — owns and prioritises No — consumes the schedule
Budget ownership Typically no Yes
Reports to CPO / Head of Product PMO / Programme Manager

In SAFe (Scaled Agile Framework):
The roles are further separated — the Product Manager owns the Program Backlog (features), while the Product Owner owns the Team Backlog (stories). A Release Train Engineer (RTE) performs the coordination role similar to a Programme Manager.

Overlap in hybrid organisations:
In smaller organisations or hybrid projects, one person may wear both hats. However, the tension between “build the right thing” (PO) and “build the thing right on time” (PM) means separating the roles leads to better outcomes in complex projects.

↥ back to top

Q. What is Change Management in project management?

Change Management in project management refers to the structured process for requesting, evaluating, approving, and implementing changes to a project's agreed scope, schedule, cost, or quality baseline. It prevents unauthorised scope creep while allowing necessary changes to be incorporated in a controlled manner.

Change Control Process:

  1. Change Request (CR) Raised — Any stakeholder identifies a need to change the agreed baseline and submits a formal Change Request document.
  2. Impact Assessment — The PM, Tech Lead, and relevant SMEs assess the impact on scope, schedule, cost, quality, risks, and resources.
  3. Change Control Board (CCB) Review — The CCB (sponsor, PM, key stakeholders) reviews the CR and impact assessment.
  4. Decision — Approved / Rejected / Deferred with documented rationale.
  5. Implementation — Approved changes are planned, resourced, and executed.
  6. Baseline Update — Project baselines (scope, schedule, cost) are formally updated to reflect the approved change.
  7. Communication — All affected stakeholders are notified of the approved change.

Types of Change:

Type Description
Scope change Addition, removal, or modification of deliverables
Schedule change Change to milestones or delivery dates
Cost change Budget adjustment or reallocation
Quality change Modification to acceptance criteria or standards
Risk-driven change Change triggered by a materialised risk or issue

Agile Change Management:
In Agile teams, change is embraced by design — the product backlog is continuously refined and reprioritised. However, formal change control still applies to:

Organisational Change Management (OCM):
Beyond project change control, OCM addresses the people side of change — managing resistance, communication, training, and adoption when systems, processes, or structures change. Frameworks include Prosci ADKAR (Awareness, Desire, Knowledge, Ability, Reinforcement) and Kotter's 8-Step Model.

↥ back to top

Q. What are DORA metrics and why do they matter?

DORA metrics (DevOps Research and Assessment) are a set of four key metrics developed by the DORA research team (now part of Google Cloud) that measure the performance of software delivery and operational stability. They are the industry-standard benchmark for engineering team effectiveness.

The Four DORA Metrics:

Metric Definition Measures
Deployment Frequency (DF) How often the team deploys to production Speed / Throughput
Lead Time for Changes (LT) Time from code commit to running in production Speed / Efficiency
Change Failure Rate (CFR) % of deployments that cause a production failure Quality / Stability
Mean Time to Restore (MTTR) Time to restore service after a production failure Resilience / Recovery

Performance Tiers (DORA 2023 Report):

Tier Deployment Frequency Lead Time Change Failure Rate MTTR
Elite On-demand (multiple/day) < 1 hour 0–5% < 1 hour
High Daily to weekly 1 day – 1 week 5–10% < 1 day
Medium Weekly to monthly 1 week – 1 month 10–15% 1 day – 1 week
Low < Once per month > 6 months 46–60% > 6 months

Why DORA metrics matter:

How to improve DORA metrics:

Metric Key Improvement Levers
Deployment Frequency Trunk-based development, feature flags, automated CI/CD pipelines
Lead Time for Changes Smaller PRs, faster code reviews, automated testing, pipeline optimisation
Change Failure Rate Shift-left testing, canary/blue-green deployments, SAST/DAST in pipeline
MTTR Observability (logs, traces, metrics), runbooks, chaos engineering, on-call rotation

DORA + SPACE Framework:
Modern engineering organisations combine DORA (flow and stability metrics) with the SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) to get a more holistic view of developer productivity beyond just deployment speed.

↥ back to top

# 21. SOFTWARE ARCHITECT INTERVIEW QUESTIONS


Q. What is the difference between Software Architecture and Software Design?

Aspect Software Architecture Software Design
Scope High-level structure of the entire system Low-level, component/module level decisions
Focus System-wide quality attributes (scalability, availability, security) Implementation details (classes, algorithms, patterns)
Stakeholders Business, management, all technical teams Primarily development team
Changes Expensive and hard to change Easier to refactor
Artifacts Architectural diagrams, ADRs, system blueprints Class diagrams, sequence diagrams, pseudocode

Key principle: Architecture defines the skeleton of a system; design fleshes it out. A wrong architectural decision is far more costly to fix than a wrong design decision.

↥ back to top

Q. What are the key quality attributes (Non-Functional Requirements) an architect must consider?

Quality attributes drive architectural decisions and trade-offs:

Attribute Description Common Tactics
Scalability Handle increasing load Horizontal scaling, partitioning, load balancing
Availability System uptime (e.g., 99.99%) Redundancy, failover, health checks
Performance Response time, throughput Caching, async processing, CDN
Security Protect data and access Auth/authz, encryption, input validation
Maintainability Ease of changes Modularity, low coupling, documentation
Testability Ability to verify behavior Dependency injection, contract tests
Observability System insight in production Logging, metrics, distributed tracing
Resilience Recover from failures Circuit breaker, retry, bulkhead
Portability Deploy across environments Containerization, IaC, 12-Factor App

Trade-offs are inevitable — optimizing for all attributes simultaneously is impossible. An architect must communicate and justify trade-offs clearly.

↥ back to top

Q. Explain Microservices Architecture and when you would NOT use it.

Microservices Architecture decomposes an application into small, independently deployable services, each owning its data and communicating over APIs (REST, gRPC, messaging).

Benefits:

When NOT to use microservices:

  1. Small team / startup — operational overhead outweighs benefits; a well-structured monolith is faster to iterate
  2. Unclear domain boundaries — premature decomposition leads to chatty, tightly coupled services (distributed monolith)
  3. Low operational maturity — microservices require CI/CD, container orchestration, observability, and service mesh
  4. Latency-sensitive workflows — network hops between services add latency compared to in-process calls
  5. Shared database requirements — if services must share a database, you lose the key benefit of independent data ownership

Martin Fowler's advice: Start with a monolith, identify seams, then extract services as boundaries become clear.

↥ back to top

Q. What is the CAP Theorem and how does it influence database selection?

CAP Theorem states that a distributed system can guarantee at most two of the three properties simultaneously:

Since network partitions are unavoidable in distributed systems, the real trade-off is CP vs AP:

Choice Example Databases Use Case
CP (Consistent + Partition-tolerant) HBase, MongoDB (strong mode), Zookeeper Financial transactions, inventory systems
AP (Available + Partition-tolerant) Cassandra, CouchDB, DynamoDB (eventual) Social feeds, shopping carts, DNS

PACELC extends CAP by also considering latency vs consistency trade-offs when there is no partition.

↥ back to top

Q. What is Event-Driven Architecture (EDA) and what are its benefits and challenges?

Event-Driven Architecture is a design paradigm where components communicate by producing and consuming events (immutable records of something that happened) via a message broker (Kafka, RabbitMQ, AWS SNS/SQS).

Core patterns:

Benefits:

Challenges:

↥ back to top

Q. How would you design a system for 10 million concurrent users?

This is a classic scalability question. A structured answer covers multiple layers:

1. Load Balancing

2. Stateless Application Tier

3. Caching Strategy

4. Database Tier

5. Asynchronous Processing

6. Observability

7. Resilience

↥ back to top

Q. What is Domain-Driven Design (DDD) and how does it relate to microservices?

Domain-Driven Design (DDD) is a software development approach that aligns the software model with the business domain using a shared language (Ubiquitous Language) between developers and domain experts.

Core DDD concepts:

Concept Description
Domain The business problem space being solved
Ubiquitous Language Shared vocabulary used in code and conversation
Bounded Context Explicit boundary within which a model applies
Aggregate Cluster of domain objects treated as a single unit with a root entity
Entity Object with a unique identity persisting over time
Value Object Immutable object defined by its attributes, no identity
Domain Event Something significant that happened in the domain
Repository Abstraction for data access
Anti-Corruption Layer (ACL) Translates between bounded contexts to avoid model leakage

DDD → Microservices mapping:

↥ back to top

Q. What are the SOLID principles and how do they apply at the architectural level?

Principle Statement Architectural Application
S — Single Responsibility A module should have one reason to change Each microservice owns one business capability
O — Open/Closed Open for extension, closed for modification Use plugin architectures, feature flags, strategy pattern
L — Liskov Substitution Subtypes must be substitutable for base types Enforce API contracts; a service upgrade must not break consumers
I — Interface Segregation Clients shouldn't depend on interfaces they don't use Design fine-grained APIs; avoid fat shared libraries
D — Dependency Inversion Depend on abstractions, not concretions Use dependency injection; services depend on interfaces, not implementations

At scale, SOLID applies across service and module boundaries, not just within a class hierarchy.

↥ back to top

Q. Explain the Strangler Fig Pattern for legacy system migration.

The Strangler Fig Pattern (Martin Fowler) enables incremental migration from a legacy monolith to a new architecture without a risky “big-bang” rewrite.

Steps:

  1. Place a facade/proxy (API Gateway or reverse proxy) in front of the legacy system
  2. Identify a bounded feature slice to migrate first
  3. Build the new service implementing that feature
  4. Redirect traffic for that feature from the legacy system to the new service via the facade
  5. Repeat for remaining features until the legacy system is “strangled” (replaced entirely)

Benefits:

Challenges:

↥ back to top

Q. What is the difference between Orchestration and Choreography in microservices?

Aspect Orchestration Choreography
Control Central orchestrator directs all steps Each service knows what to do and reacts to events
Coupling Services coupled to orchestrator Services coupled only to events
Visibility Easy to trace flow in one place Flow is distributed; harder to visualize
Failure handling Orchestrator manages compensating transactions Each service handles its own failures
Example Saga orchestrated via a workflow engine (Temporal, AWS Step Functions) Saga choreographed via domain events on Kafka
Best for Complex business workflows with many decision points Simple, decoupled event flows

Saga Pattern uses both approaches to handle distributed transactions without two-phase commit (2PC), using compensating transactions to undo completed steps on failure.

↥ back to top

Q. How do you approach API design as an architect? What makes a good API?

Characteristics of a well-designed API:

  1. Consistency — uniform naming conventions, error formats, pagination patterns
  2. Backward compatibility — never break existing consumers; use versioning (/v1/, /v2/)
  3. Principle of Least Surprise — behavior should match developer expectations
  4. IdempotencyPUT, DELETE, and retry-safe POST operations should be idempotent
  5. Proper HTTP semantics — correct status codes, verbs, and headers
  6. Pagination and filtering — for collection endpoints; avoid unbounded responses
  7. Security by design — authentication, authorization, rate limiting, input validation at API layer
  8. Documentation — OpenAPI/Swagger specs as the contract; treat docs as code

API styles and trade-offs:

Style Best For Drawbacks
REST CRUD resources, broad client support Over-fetching/under-fetching
GraphQL Flexible queries, mobile clients Complex caching, N+1 queries
gRPC High-performance internal services Less browser-friendly, steeper learning curve
WebSockets Real-time bidirectional communication Stateful, harder to scale
AsyncAPI / Messaging Event-driven, async workflows Eventual consistency, debugging complexity
↥ back to top

Q. What is the 12-Factor App methodology?

The 12-Factor App is a methodology for building cloud-native, scalable, and maintainable software-as-a-service applications:

Factor Principle
1. Codebase One codebase tracked in VCS; many deploys
2. Dependencies Explicitly declare and isolate dependencies
3. Config Store config in the environment, not code
4. Backing Services Treat databases, queues, etc. as attached resources
5. Build, Release, Run Strictly separate build and run stages
6. Processes Execute the app as stateless processes
7. Port Binding Export services via port binding
8. Concurrency Scale out via the process model
9. Disposability Fast startup, graceful shutdown
10. Dev/Prod Parity Keep development and production as similar as possible
11. Logs Treat logs as event streams
12. Admin Processes Run admin tasks as one-off processes
↥ back to top

Q. What are Architecture Decision Records (ADRs) and why are they important?

An Architecture Decision Record (ADR) is a short document that captures an important architectural decision, including its context, the decision made, and the consequences.

Standard ADR structure (Nygard format):

**ADR-001: Use PostgreSQL as the primary datastore**

**Status**

Accepted

**Context**

We need a relational datastore that supports ACID transactions for our
order management system. The team has strong SQL expertise.

**Decision**

We will use PostgreSQL 15 hosted on AWS RDS with read replicas.

**Consequences**

- Strong consistency and ACID guarantees for order records
- Need to plan for schema migrations (use Flyway/Liquibase)
- Horizontal write scaling requires sharding strategy in the future
- Licensing: open source, no cost concerns

Why ADRs matter:

↥ back to top

Q. How do you ensure security in a distributed system architecture?

Defense in Depth — apply security controls at every layer:

1. Network Layer

2. Identity and Access

3. Application Layer

4. Data Layer

5. Operational

↥ back to top

Q. What is the difference between Horizontal and Vertical scaling?

Aspect Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out)
Approach Add more CPU/RAM/storage to existing machine Add more machines/instances
Limit Physical hardware ceiling Virtually unlimited (cloud)
Cost Exponential cost growth at high end Linear cost growth
Downtime Often requires restart No downtime with rolling deployment
Complexity Simple — no code changes Requires stateless design, load balancing
Best for Databases (initially), legacy apps Stateless web/API tiers, microservices

Architectural implication: Design stateless services from the start to enable horizontal scaling. Offload state to distributed caches (Redis) or databases.

↥ back to top

Q. Explain the Circuit Breaker pattern and how it improves system resilience.

The Circuit Breaker pattern (Michael Nygard, Release It!) prevents a failing service from causing cascading failures across a distributed system.

States:

[CLOSED] ──(failures exceed threshold)──► [OPEN] ──(timeout expires)──► [HALF-OPEN]
    ▲                                                                          │
    └──────────────────(probe request succeeds)────────────────────────────────┘

Implementation in Node.js (using opossum):

const CircuitBreaker = require('opossum');

const breaker = new CircuitBreaker(callExternalService, {
  timeout: 3000,          // if function takes > 3s, trigger failure
  errorThresholdPercentage: 50,  // open when 50% of requests fail
  resetTimeout: 10000,    // try again after 10s
});

breaker.fallback(() => ({ data: null, source: 'cache' }));
breaker.on('open', () => console.warn('Circuit OPEN — calls short-circuited'));
breaker.on('halfOpen', () => console.info('Circuit HALF-OPEN — probing'));
breaker.on('close', () => console.info('Circuit CLOSED — service recovered'));

Related patterns: Retry with exponential backoff, Bulkhead (isolate failure domains), Timeout (don't wait forever).

↥ back to top

Q. What is CQRS and Event Sourcing? When should you use them?

CQRS (Command Query Responsibility Segregation):

Separates the write model (Commands — mutate state) from the read model (Queries — retrieve state).

Client ──► Command Handler ──► Write DB (normalized)
                                     │
                              (event/projection)
                                     ▼
Client ──► Query Handler ──────► Read DB (denormalized, optimized per use case)

Benefits: Read and write models can scale independently; read models can be tailored for specific query patterns.

Event Sourcing:

Instead of storing the current state, store a sequence of events that led to that state. Current state is derived by replaying events.

events = [
  { type: 'OrderPlaced', orderId: '1', items: [...] },
  { type: 'PaymentReceived', orderId: '1', amount: 99.99 },
  { type: 'OrderShipped', orderId: '1', trackingId: 'XYZ' },
]
// currentState = events.reduce(applyEvent, initialState)

Benefits: Complete audit trail, temporal queries (“what was the state at time T?”), event replay for projections.

When to use:

↥ back to top

Q. How would you handle distributed transactions across microservices?

Traditional ACID transactions don't span service boundaries. Distributed transaction strategies:

1. Saga Pattern (recommended)

A sequence of local transactions, each publishing an event/message to trigger the next step. On failure, compensating transactions undo completed steps.

PlaceOrder → ReserveInventory → ProcessPayment → ShipOrder
                                      │ FAIL
                              ◄── ReleaseInventory ◄── CancelOrder (compensate)

2. Two-Phase Commit (2PC) — generally avoided

3. Outbox Pattern Ensures atomicity between database writes and message publishing:

4. Idempotent Consumers All event consumers must be idempotent (processing the same event twice produces the same result) to handle at-least-once delivery semantics.

↥ back to top

Q. What is the difference between REST, GraphQL, and gRPC? When do you choose each?

Factor REST GraphQL gRPC
Protocol HTTP/1.1 HTTP/1.1 HTTP/2
Data format JSON/XML JSON Protocol Buffers (binary)
Schema OpenAPI (optional) Strongly typed schema Strongly typed .proto
Fetching Fixed endpoint, fixed response Client-specified fields Fixed RPC methods
Performance Good Good (can be N+1 issue) Excellent (binary, multiplexing)
Browser support Native Native Limited (grpc-web required)
Streaming Limited (SSE) Subscriptions Native bidirectional streaming

Choose REST when: Building public APIs, browser clients, broad interoperability needed.

Choose GraphQL when: Mobile/frontend clients need flexible queries, multiple clients with different data needs, BFF (Backend for Frontend) layer.

Choose gRPC when: Internal service-to-service communication, high-throughput/low-latency requirements, polyglot environments needing strong contracts.

↥ back to top

Q. How do you approach observability in a distributed system?

Observability is built on three pillars:

1. Logs

2. Metrics

3. Traces

Implementation in Node.js:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();

4. Alerting

↥ back to top

Q. What is the difference between a Monolith, SOA, and Microservices?

Aspect Monolith SOA Microservices
Deployment unit Single deployable artifact Large services (shared ESB) Many small independent services
Communication In-process function calls SOAP/ESB (heavy middleware) REST/gRPC/messaging (lightweight)
Data Shared database Often shared database Each service owns its data
Team structure Feature teams span the whole app Cross-functional but large Small, autonomous teams (two-pizza rule)
Scalability Scale everything together Coarse-grained scaling Fine-grained scaling per service
Operational complexity Low Medium High
Best for Early-stage products, small teams Enterprise integration scenarios Large-scale, complex domains with many teams

Key insight: Microservices solve organizational complexity (Conway's Law) as much as they solve technical scaling problems.

↥ back to top

Q. How do you handle backward compatibility and versioning in APIs?

Versioning strategies:

Strategy Example Pros Cons
URI versioning /v1/users Simple, explicit, cacheable URL proliferation
Header versioning Accept: application/vnd.api.v2+json Clean URLs Less discoverable
Query parameter /users?version=2 Easy testing Unconventional
Content negotiation Accept-Version: 2 REST-purist approach Less tooling support

Backward-compatible changes (non-breaking):

Breaking changes (require new version):

Best practices:

↥ back to top

Q. What is Infrastructure as Code (IaC) and why is it important for architects?

Infrastructure as Code treats infrastructure provisioning (servers, networks, databases, load balancers) as software — defined in version-controlled files, applied automatically.

Key tools:

Tool Type Approach
Terraform Declarative IaC Multi-cloud, state-based
AWS CloudFormation Declarative IaC AWS-native
Pulumi Imperative IaC Real programming languages
Ansible Configuration management Agentless, YAML playbooks
Helm Kubernetes package manager Templated K8s manifests

Why architects care:

↥ back to top

Q. How do you evaluate and select a technology stack for a new project?

A structured evaluation framework:

1. Business constraints

2. Technical fit

3. Team capabilities

4. Long-term considerations

5. Proof of Concept

Common pitfall: Choosing technology because it's trending rather than because it solves the actual problem better than alternatives.

↥ back to top

Q. What is Zero Downtime Deployment and what strategies enable it?

Zero Downtime Deployment ensures users experience no service interruption during a new version release.

Strategies:

1. Rolling Deployment

2. Blue-Green Deployment

3. Canary Release

4. Feature Flags

Database migration challenge:

↥ back to top

Q. How do you design for failure? What is chaos engineering?

Design for Failure Principles:

Chaos Engineering (pioneered by Netflix's Chaos Monkey) is the practice of intentionally injecting failures into a system in production (or production-like environments) to proactively discover weaknesses.

Chaos Engineering process:

  1. Define steady state (key business/technical metrics)
  2. Hypothesize: “If we kill this instance, steady state will be maintained”
  3. Inject failure (instance termination, network latency, disk full, CPU spike)
  4. Observe and compare against steady state
  5. Fix discovered weaknesses; repeat

Tools: Netflix Chaos Monkey, Gremlin, LitmusChaos (Kubernetes), AWS Fault Injection Simulator

Key distinction: Chaos Engineering is not random destruction — it is a controlled scientific experiment to build confidence in system resilience.

↥ back to top