Project Management Basics

Click ★ if you like the project. Your contributions are heartily ♡ welcome.

DOS Commands
Puzzles

Questions

No.	Questions
1	What are the scientific ways to do project estimation?
2	What is planning poker estimation technique?
3	What is Ballpark Figures estimate?
4	What are the tools used for requirements gathering?
5	Explain the concept of RAID in project management?
6	What are the techniques used to define the scope of a project?
7	Explain Ishikawa/Fishbone diagrams?
8	What is the process of calculating the three-point estimating method?
9	What is Work Breakdown Structure (WBS)?
10	What is the Pareto principle analysis?
11	What is Gherkin approach for writing user stories?
12	What are the roles and responsibilities of a Technical Lead?
13	What is the difference between Agile and Waterfall methodologies?
14	What is a Sprint Retrospective and how is it conducted?
15	What is the Critical Path Method (CPM) in project management?
16	What is Earned Value Management (EVM)?
17	What is a Project Charter and what does it contain?
18	What is the difference between a Product Owner and a Project Manager?
19	What is Change Management in project management?
20	What are DORA metrics and why do they matter?
21	Software Architect Interview Questions

Q. What are the scientific ways to do project estimation?

There are many different types of estimation techniques used in project management with various streams like Engineering, IT, Construction, Agriculture, Accounting, etc. A Project manager is often challenged to align mainly six project constraints - Scope, Time, Cost, Quality, Resources and Risk in order to accurately estimate the project. The common questions that come into the mind of a project manager at the start of the project are –

How much work is to be estimated (scope)
How to estimate the project (techniques)
How much time it will require to complete the project (Schedule)
Who will be doing the project (resources)
What is the budget required to deliver the project (cost)
Any intermediary dependencies that may delay or impact the project (Risks)

There are 3 major parts to project estimation mainly:-

Effort estimation
Cost estimation
Resource estimate

While accurate estimates are the basis of sound project planning, there are many techniques used as project management best practices in estimation as - Analogous estimation, Parametric estimation, Delphi method, 3 Point Estimate, Expert Judgment, Published Data Estimates, Vendor Bid Analysis, Reserve Analysis, Bottom-Up Analysis, and Simulation. Usually, during the early stages of a project life cycle, the project requirements are feebly known and less information is available to estimate the project. The initial estimate is drawn merely by assumptions knowing the scope at a high level, this is known as ‘Ball-park estimates’, a term very often used by project managers.

Top-down estimate:
Once more detail is learned on the scope of the project, this technique is usually followed where high-level chunks at the feature or design level are estimated and are decomposed progressively into smaller chunks or work-packets as information is detailed.

Bottom-up estimate:
This technique is used when the requirements are known at a discrete level where the smaller workpieces are then aggregated to estimate the entire project. This is usually used when the information is only known in smaller pieces.

Analogous estimating:
This technique is used when there is a reference to a similar project executed and it is easy to correlate with other projects. Expert judgment and historical information of similar activities in a referenced project are gathered to arrive at an estimate of the project.

Parametric estimate:
This technique uses independent measurable variables from the project work. For example, the cost for construction of a building is calculated based on the smallest variable as the cost to build a square feet area, the effort required to build a work packet is calculated from the variable as lines of codes in a software development project. This technique gives more accuracy in project estimation.

Three-point estimating:
This technique uses a mathematical approach as the weighted average of an optimistic, most likely and pessimistic estimate of the work package. This is often known as the PERT (Program Evaluation and Review Technique).

What-if analysis:
This technique uses assumptions based on varying factors like scope, time, cost, resources, etc., to evaluate the possible outcomes of the project by doing impact analysis. In a usual scenario, the project estimate is done by conducting estimation workshops with the stakeholders of the project, senior team members who could give valuable inputs to the estimation exercise. The high-level scope is broken down into smaller work packages, components and activities, each work package is estimated by effort and resource needed to complete the work package. The project may be detailed into the smallest chunk that can be measured. The following activities are done during the workshop:

Break down the scope into smallest work package, components or activities (WBS)
Sequence the activities in the order in which they will be performed
Identify the effort required to complete each activity
Identify the resource estimate to complete each task or activity
Identify the dependencies to complete each activity
Identify the possible risks and assumptions
Define the resource and cost estimate to the completion of each activity, component and work package

T-shirt Sizing (Modern Agile):
A relative estimation technique where work items are sized as XS, S, M, L, XL, or XXL instead of hours or story points. It is fast, collaborative, and avoids false precision. Widely used in SAFe (Scaled Agile Framework) PI Planning and early backlog grooming when detailed estimates are not yet feasible.

Monte Carlo Simulation (Advanced):
A probabilistic estimation technique that runs thousands of simulated project scenarios using historical velocity data and task variability to generate a probability distribution of completion dates and costs. Used in risk-heavy and large-scale projects for confidence-interval forecasting (e.g., “there is an 85% probability of delivery by Q3”).

AI-Assisted Estimation (Emerging Practice):
Modern teams leverage AI tools (GitHub Copilot, Jira AI, Azure DevOps Copilot) to assist in effort estimation by analyzing historical project data, comparing similar past user stories, and suggesting story point ranges. AI does not replace expert judgment but reduces anchoring bias and speeds up estimation workshops.

PMBOK 7th Edition Note: The latest PMI standard (2021) shifts from prescriptive processes to 12 principles and 8 performance domains, emphasizing adaptability and outcomes over rigid methodologies. Estimation is now treated as a continuous activity rather than a one-time planning event, aligned with hybrid and agile delivery models.

↥ back to top

Q. What is planning poker estimation technique?

Planning poker, also called Scrum poker, is a consensus-based, gamified technique for estimating, mostly used to estimate effort or relative size of development goals in software development. Planning Poker is an agile estimating and planning technique that is consensus based. To start a poker planning session, the product owner or customer reads an agile user story or describes a feature to the estimators.

Each estimator is holding a deck of Planning Poker cards with values like 0, 1, 2, 3, 5, 8, 13, 20, 40 and 100, which is the sequence we recommend. The values represent the number of story points, ideal days, or other units in which the team estimates.

The estimators discuss the feature, asking questions of the product owner as needed. When the feature has been fully discussed, each estimator privately selects one card to represent his or her estimate. All cards are then revealed at the same time.

If all estimators selected the same value, that becomes the estimate. If not, the estimators discuss their estimates. The high and low estimators should especially share their reasons. After further discussion, each estimator re-selects an estimate card, and all cards are again revealed at the same time.

The poker planning process is repeated until consensus is achieved or until the estimators decide that agile estimating and planning of a particular item needs to be deferred until additional information can be acquired.

Benefits of Planning Poker Estimation

Planning poker combines three methods of estimation −

Expert Opinion − In expert opinion-based estimation approach, an expert is asked how long something will take or how big it will be. The expert provides an estimate relying on his or her experience or intuition or gut feel. Expert Opinion Estimation usually doesn’t take much time and is more accurate compared to some of the analytical methods.
Analogy − Analogy estimation uses comparison of user stories. The user story under estimation is compared with similar user stories implemented earlier, giving accurate results as the estimation is based on proven data.
Disaggregation − Disaggregation estimation is done by splitting a user story into smaller, easier-to-estimate user stories. The user stories to be included in a sprint are normally in the range of two to five days to develop. Hence, the user stories that possibly take longer duration need to be split into smaller use-Cases. This approach also ensures that there would be many stories that are comparable.

Modern Enhancements to Planning Poker:

Digital Planning Poker Tools − Remote and distributed teams use online tools such as PlanningPoker.com, Jira's built-in Story Point voting, Miro, Parabol, and Microsoft Teams apps to run asynchronous or real-time estimation sessions without physical cards.
T-shirt Sizing as Alternative − Teams that find numeric story points divisive often switch to T-shirt sizing (XS/S/M/L/XL) during early refinement, then convert to points before sprint commitment. This reduces anchoring on specific numbers.
Async Estimation − In hybrid or distributed teams, product owners share user stories asynchronously before the meeting. Estimators review in advance, reducing meeting time and allowing deeper individual analysis before group discussion.
AI-Augmented Poker − Tools like Jira AI and Linear now suggest story point estimates based on similarity to previously completed issues, which teams can confirm or override during the poker session.
#NoEstimates Movement − A growing practice in mature Agile teams that challenges the value of detailed estimation. Instead of estimating story points, teams count the number of stories completed per sprint (throughput) and use statistical forecasting (Monte Carlo) to predict delivery dates — reducing estimation overhead.

↥ back to top

Q. What is Ballpark Figures estimate?

A ballpark figure is a rough numerical estimate or approximation of the value of something that is otherwise unknown. Ballpark figures are commonly used by accountants, salespersons, and other professionals to estimate current or future results. A stockbroker could use a ballpark figure to estimate how much money a client might have at some point in the future, given a certain rate of growth. A salesperson could use a ballpark figure to estimate how long a product a customer was thinking about buying might be viable.

A ballpark figure is essentially a placeholder established for purposes of speculating what the amount or total of something might amount to so that the parties involved can move forward in whatever negotiation or planning is underway. As a concept, it has applications in business estimates, as well as in everyday life, depending on the circumstances.

Ballpark figures are estimates used to move a discussion or deal forward when the exact measurement of the size or amount of something cannot yet be determined.

Ballpark figures can be used for day-to-day purposes, such as estimating how much food and beverages might be needed for a barbecue or how many months it will likely take to pay off a new purchase.

Ballpark figures are also used everywhere in the business world, such as estimating how much it might cost to expand into a certain market, or how many years it might take for a company to be profitable or for sales to justify a large purchase. It can also be used to estimate public adoption of a concept, technology, or product, as in how many people are likely to buy a certain phone and how long it might take them to upgrade that phone, once purchased.

KEY TAKEAWAYS

A ballpark figure is an estimate of what something might amount to numerically when a more accurate number is assessed, such as the cost of a product.
Ballpark figures can be helpful in establishing a placeholder, for purposes of conversation or estimation, when a more precise number is not available.
Ballpark figures are used in daily life and in many aspects of business; however, it is important to remember it is just an estimate, not an accurate read of something.

Q. What are the tools used for requirements gathering?

Requirements gathering is the process of identifying and documenting the needs of stakeholders for a new or modified product or system. The following tools and techniques are commonly used:

Interviews − One-on-one or group discussions with stakeholders to elicit requirements directly. Structured or unstructured interviews help uncover needs, expectations, and constraints.
Questionnaires and Surveys − Used to gather requirements from a large group of stakeholders efficiently. Closed-ended questions provide quantitative data; open-ended questions capture qualitative insights.
Workshops / JAD Sessions − Joint Application Development (JAD) workshops bring together business users, developers, and analysts in a structured setting to collaboratively define requirements.
Observation (Job Shadowing) − Analysts observe end-users performing their daily tasks to understand workflows and identify implicit requirements not surfaced through interviews.
Document Analysis − Reviewing existing documentation such as business process manuals, system specifications, contracts, and reports to extract requirements.
Prototyping − Creating a working model or mockup of the system to gather feedback from stakeholders and refine requirements iteratively.
Brainstorming − A group creativity technique where stakeholders and team members generate a large number of ideas about requirements in a free-form setting.
Focus Groups − A moderated discussion with a representative group of end-users or stakeholders to gather opinions and feedback on proposed solutions.
Use Cases and User Stories − Structured narratives that describe how a user interacts with a system to achieve a goal. Widely used in Agile and UML-based methodologies.
Mind Mapping − A visual tool to organize and structure requirements around a central concept, helping teams see relationships between requirements.
Event Storming (Modern) − A collaborative workshop technique (by Alberto Brandolini) where domain experts and developers use sticky notes on a large surface to map out domain events, commands, and actors. Highly effective for discovering requirements in complex domains and microservices design.
Story Mapping (Modern) − A technique (by Jeff Patton) that organizes user stories into a two-dimensional map: the horizontal axis represents the user's journey (activities), and the vertical axis shows priority slices (MVP vs. future releases). It provides a holistic view of the product scope and delivery increments.
Impact Mapping (Modern) − A strategic planning technique (by Gojko Adzic) that creates a mind map answering four questions: Why (goal), Who (actors), How (impacts), and What (deliverables). It ensures requirements are tied directly to business outcomes and prevents scope creep driven by low-value features.
Continuous Discovery (Modern) − A practice (by Teresa Torres) where product teams run ongoing, lightweight user research (weekly interviews, usability tests, opportunity trees) rather than a one-time discovery phase. Requirements are continuously refined based on real user evidence throughout the delivery lifecycle.
AI-Assisted Requirements (Emerging) − AI tools such as GitHub Copilot, Jira AI, and Confluence AI can analyze existing documentation, user feedback, and support tickets to suggest requirements, identify gaps, and auto-generate user stories and acceptance criteria.

↥ back to top

Q. Explain the concept of RAID in project management?

RAID is an acronym that stands for Risks, Assumptions, Issues, and Dependencies. It is a project management tool used to track and manage the key factors that can affect the successful delivery of a project.

Risks − A risk is an uncertain event or condition that, if it occurs, has a positive or negative effect on the project objectives. Risks are identified, assessed for probability and impact, and mitigation or contingency plans are created. Examples include budget overruns, resource unavailability, or technology failures.
Assumptions − Assumptions are factors considered to be true, real, or certain without proof or demonstration. They are made during planning when information is incomplete. Examples include assuming a third-party vendor will deliver on schedule, or that a certain technology will be available. Assumptions must be validated as the project progresses.
Issues − Issues are problems or concerns that have already occurred and need to be resolved. Unlike risks (which are potential), issues are actual events impacting the project. Each issue is logged with an owner and a target resolution date.
Dependencies − Dependencies are relationships between project tasks or between the project and external factors. They define the order in which tasks must be performed. Types include:
- Internal dependencies − between tasks within the same project
- External dependencies − between the project and outside entities (other projects, vendors, regulatory approvals)
- Finish-to-Start (FS) − Task B cannot start until Task A finishes
- Start-to-Start (SS) − Task B cannot start until Task A starts

A RAID log is a living document, maintained and reviewed regularly throughout the project lifecycle to ensure proactive management of these factors.

RAIDC — Extended Modern Variant:

Some organizations extend the RAID acronym to RAIDC by adding:

Constraints − Fixed limitations (budget caps, regulatory deadlines, technology mandates) that cannot be changed and must be worked within.

Modern RAID Tooling:

Tool	Usage
Jira	Risks and issues tracked as tickets with labels, priorities, and owners
Azure DevOps	Work items and risk registers integrated with sprint boards
Confluence / SharePoint	RAID log maintained as a shared, searchable document
Monday.com / Smartsheet	Visual RAID dashboards with status tracking and notifications

RAID in Hybrid and Agile Projects:
In Agile teams, RAID items are surfaced during daily standups (issues), sprint retrospectives (risks and assumptions), and backlog refinement (dependencies). In SAFe, RAID is managed at the Program Increment (PI) level and tracked in the ART (Agile Release Train) risk board using a ROAM technique:

Resolved − Risk is no longer a concern
Owned − An owner is assigned to manage the risk
Accepted − Team acknowledges the risk with no mitigation planned
Mitigated − Action taken to reduce probability or impact

↥ back to top

Q. What are the techniques used to define the scope of a project?

Project scope definition involves clearly documenting the boundaries of a project — what is included and what is excluded. Common techniques include:

Decomposition − Breaking down the overall project deliverables into smaller, manageable components using a Work Breakdown Structure (WBS) to define the full scope.
Product Analysis − Analyzing the product or service to be delivered by translating high-level objectives into tangible deliverables. Includes product breakdown, systems analysis, and value engineering.
Alternatives Analysis − Evaluating different approaches to achieve the project objectives to determine the most suitable scope of work.
Facilitated Workshops − Cross-functional workshops (e.g., JAD sessions) that bring key stakeholders together to rapidly define and align on project scope.
Expert Judgment − Leveraging subject matter experts (SMEs) to validate and refine the scope definition based on experience with similar projects.
Scope Statement − A formal document that describes the project's deliverables, objectives, constraints, assumptions, and exclusions. It serves as the baseline for scope management.
MoSCoW Prioritization − Categorizing requirements as Must have, Should have, Could have, and Won't have to focus scope on the highest-priority deliverables.
Prototyping − Building iterative models or mockups to help stakeholders visualize the end product and confirm the scope before full development begins.
Context Diagrams − Visual representations showing the system in scope and its interfaces with external entities, helping define system boundaries clearly.
OKRs (Objectives and Key Results) (Modern) − A goal-setting framework used to align project scope with strategic business outcomes. Each Objective is a qualitative goal, supported by 2–5 measurable Key Results. Popularized by Google and now widely adopted in product organizations to ensure scope decisions are outcome-driven rather than output-driven.
Story Mapping (Modern) − Organizes the project scope visually by mapping user activities (horizontal) against priority slices (vertical), showing the MVP and future increments clearly. Prevents scope creep by anchoring each feature to a user journey step.
Value Stream Mapping (VSM) (Modern) − A Lean technique that maps the entire flow of value delivery from customer request to delivery. It helps identify waste, bottlenecks, and non-value-adding activities, ensuring scope is focused on steps that deliver real customer value.
Impact Mapping (Modern) − A strategic technique that maps scope to business outcomes by answering: Why (goal) → Who (stakeholders) → How (behavior changes) → What (deliverables). It prevents gold-plating by rejecting features that cannot be traced back to a business impact.

↥ back to top

Q. Explain Ishikawa/Fishbone diagrams?

An Ishikawa diagram, also called a Fishbone diagram or Cause-and-Effect diagram, is a visual tool used to systematically identify and analyze the root causes of a problem or defect. It was developed by Japanese quality control expert Kaoru Ishikawa.

The diagram resembles a fish skeleton:

The fish head (on the right) represents the effect — the problem or outcome being analyzed.
The backbone is the central horizontal line.
The ribs/bones branching off the backbone represent categories of causes.
Smaller bones branching off the ribs represent specific causes within each category.

Common cause categories (the 6 Ms used in manufacturing):

Category	Description
Man	Human factors — skills, training, fatigue
Machine	Equipment, tools, technology
Method	Processes, procedures, workflows
Material	Raw materials, components, data
Measurement	Metrics, calibration, data accuracy
Mother Nature (Environment)	Environmental conditions, workspace

Steps to create a Fishbone diagram:

Define and write the problem (effect) at the fish head.
Identify the main cause categories and draw them as branches.
Brainstorm potential causes for each category using the “5 Whys” technique.
Analyze the diagram to identify the most likely root causes.
Prioritize causes for further investigation and corrective action.

Benefits:

Encourages team participation and structured thinking.
Visualizes complex cause-and-effect relationships.
Helps prevent recurrence by addressing root causes, not just symptoms.
Widely used in quality management (Six Sigma, TQM, Agile retrospectives).

↥ back to top

Q. What is the process of calculating the three-point estimating method?

The three-point estimating method improves the accuracy of estimates by considering uncertainty and risk. Instead of a single estimate, three scenarios are defined for each task:

Estimate	Symbol	Description
Optimistic	O	Best-case scenario — everything goes perfectly
Most Likely	M	The realistic, most probable outcome
Pessimistic	P	Worst-case scenario — maximum problems occur

Two formulas are used:

1. Triangular Distribution (Simple Average):

\[E = \frac{O + M + P}{3}\]

2. Beta Distribution (PERT — Program Evaluation and Review Technique):

\[E = \frac{O + 4M + P}{6}\]

The PERT formula gives four times the weight to the most likely estimate, making it more accurate for project planning.

Standard Deviation (for PERT):

\[SD = \frac{P - O}{6}\]

Example: A software module has the following estimates:

Optimistic (O) = 4 days
Most Likely (M) = 6 days
Pessimistic (P) = 14 days

Triangular estimate: $E = \frac{4 + 6 + 14}{3} = 8 \text{ days}$

PERT estimate: $E = \frac{4 + (4 \times 6) + 14}{6} = \frac{42}{6} = 7 \text{ days}$

Standard Deviation: $SD = \frac{14 - 4}{6} \approx 1.67 \text{ days}$

Benefits:

Accounts for uncertainty in task estimation.
Reduces the risk of overly optimistic single-point estimates.
Widely used in project scheduling (CPM, PERT networks) and Agile story point estimation.

↥ back to top

Q. What is Work Breakdown Structure (WBS)?

A Work Breakdown Structure (WBS) is a hierarchical decomposition of the total scope of work required to complete a project. It organizes and defines the project's total work into smaller, manageable components called work packages.

Key characteristics:

The WBS is deliverable-oriented, not activity-oriented — each node represents a deliverable or outcome, not an action.
It follows the 100% Rule: the WBS must include 100% of the work defined in the project scope and capture all deliverables (internal, external, and interim).
Each descending level represents a more detailed breakdown of the parent element.

WBS Structure Levels:

Level	Description	Example
Level 1	Project	E-Commerce Website
Level 2	Major Deliverables	Frontend, Backend, Database
Level 3	Sub-deliverables	UI Design, API Development
Level 4	Work Packages	Login Page, Product Listing API

Types of WBS:

Deliverable-based WBS − Organized around the project's outputs.
Phase-based WBS − Organized around project phases (initiation, planning, execution, closure).

Benefits of WBS:

Provides a clear picture of project scope and prevents scope creep.
Helps in accurate cost and time estimation.
Enables assignment of responsibilities for each work package.
Forms the foundation for project schedule (Gantt chart) and resource planning.
Facilitates progress tracking and control.

WBS Dictionary: A companion document that provides detailed information about each WBS element including description, responsible party, schedule milestones, required resources, and acceptance criteria.

Agile WBS / Product Breakdown Structure (PBS):
In Agile and hybrid projects, the traditional WBS is complemented or replaced by:

Product Breakdown Structure (PBS) − Breaks the product into components and features rather than tasks, aligning with backlog structure.
Feature Tree / Story Map − User stories and epics are organized hierarchically (Theme → Epic → Feature → User Story) in tools like Jira, Azure DevOps, or Linear, serving as a living WBS that evolves each sprint.
SAFe Program Backlog − In the Scaled Agile Framework, the WBS equivalent is the Program Backlog (Epics → Features) and Team Backlog (Stories → Tasks), managed across Program Increments (PI).

Modern WBS Tooling:

Tool	Capability
Microsoft Project	Traditional WBS with Gantt chart and resource planning
Jira	Hierarchical backlog (Epic → Story → Sub-task) as Agile WBS
Azure DevOps	Work item hierarchy (Epic → Feature → User Story → Task)
Asana / Monday.com	Visual WBS with timeline and dependency tracking
Miro / FigJam	Collaborative WBS whiteboarding for remote teams

↥ back to top

Q. What is the Pareto principle analysis?

The Pareto Principle, also known as the 80/20 Rule, states that roughly 80% of effects come from 20% of causes. It was named after Italian economist Vilfredo Pareto, who observed that 80% of Italy's land was owned by 20% of the population.

In project and quality management, this principle is applied through Pareto Analysis — a technique used to identify and prioritize the most significant problems or causes to focus effort where it will have the greatest impact.

How Pareto Analysis works:

Identify and list the problems or causes to be analyzed.
Measure the frequency or impact of each problem (e.g., number of defects, cost, time lost).
Sort in descending order from highest to lowest frequency/impact.
Calculate cumulative percentages for each problem category.
Draw a Pareto Chart — a bar chart combined with a line graph showing cumulative percentages.
Identify the vital few — the 20% of causes responsible for 80% of the problems.
Focus corrective action on those top causes first.

Example in Software Projects:

Bug Category	Count	Cumulative %
UI/UX Issues	45	45%
API Errors	30	75%
Database Issues	15	90%
Config Errors	7	97%
Other	3	100%

Fixing UI/UX and API issues (20% of categories) resolves 75% of all bugs.

Applications in project management:

Prioritizing bug fixes and quality improvement efforts.
Identifying the key risks that account for most of the project's risk exposure.
Focusing stakeholder management on the most influential stakeholders.
Optimizing resource allocation by targeting high-impact tasks.

↥ back to top

Q. What is Gherkin approach for writing user stories?

Gherkin is a plain-text, human-readable language used to write acceptance criteria and user stories in a structured format that both business stakeholders and developers can understand. It is the language used by Behavior-Driven Development (BDD) frameworks such as Cucumber, SpecFlow, and Behave.

Gherkin bridges the gap between business requirements and automated test specifications by expressing behavior in natural language using a fixed set of keywords.

Core Keywords:

Keyword	Purpose
`Feature`	High-level description of the feature being tested
`Scenario`	A specific example or test case for the feature
`Given`	The initial context or precondition
`When`	The action or event that occurs
`Then`	The expected outcome or result
`And` / `But`	Chains multiple Given/When/Then steps
`Background`	Common steps shared across all scenarios in a feature
`Scenario Outline`	A template scenario run with multiple data sets
`Examples`	Data table used with Scenario Outline

Basic Syntax Example:

Feature: User Login

  Scenario: Successful login with valid credentials
    Given the user is on the login page
    When the user enters a valid username "john@example.com"
    And the user enters a valid password "Secret@123"
    Then the user should be redirected to the dashboard
    And a welcome message "Hello, John" should be displayed

Scenario Outline with Examples (data-driven):

Feature: User Login Validation

  Scenario Outline: Login with different credentials
    Given the user is on the login page
    When the user enters username "<username>" and password "<password>"
    Then the login result should be "<result>"

  Examples:
    | username            | password    | result  |
    | john@example.com    | Secret@123  | success |
    | wrong@example.com   | Secret@123  | failure |
    | john@example.com    | wrongpass   | failure |

Background Example (shared precondition):

Feature: Shopping Cart

  Background:
    Given the user is logged in
    And the shopping cart is empty

  Scenario: Add item to cart
    When the user adds "Laptop" to the cart
    Then the cart should contain 1 item

  Scenario: Remove item from cart
    Given the user has "Laptop" in the cart
    When the user removes "Laptop" from the cart
    Then the cart should be empty

Benefits of Gherkin:

Shared understanding — business stakeholders, testers, and developers all read the same requirements.
Living documentation — Gherkin scenarios serve as both specifications and automated test scripts.
Reduces ambiguity — the structured Given/When/Then format forces precise expression of behavior.
Early defect detection — acceptance criteria are defined before development begins (shift-left testing).
Reusable steps — step definitions can be shared across multiple scenarios.

Gherkin vs Traditional User Stories:

Aspect	Traditional User Story	Gherkin Scenario
Format	As a [role], I want [goal], so that [benefit]	Given/When/Then
Audience	Business + Dev	Business + Dev + QA
Testable	Not directly	Directly executable
Tooling	Jira, Trello	Cucumber, SpecFlow, Behave

Modern Gherkin: The Rule Keyword (Gherkin 6+):
The Rule keyword was introduced to group related scenarios under a single business rule within a feature. This makes large feature files more organized and readable.

Feature: User Account Management

  Rule: Users must be verified before accessing premium features

    Scenario: Verified user accesses premium content
      Given the user has verified their email
      When the user navigates to the premium section
      Then access is granted

    Scenario: Unverified user attempts premium access
      Given the user has NOT verified their email
      When the user navigates to the premium section
      Then the user should see a verification prompt

Gherkin + AI-Assisted BDD (Emerging Practice):
Modern teams use AI tools to accelerate BDD adoption:

GitHub Copilot — auto-suggests Given/When/Then steps from plain-language descriptions.
Jira AI — converts acceptance criteria written in free text into structured Gherkin scenarios.
Cucumber Studio — a collaborative BDD platform where non-technical stakeholders write and review Gherkin scenarios without touching code.
AI test generation — tools like Diffblue and Testim generate step definitions and test code from Gherkin scenarios automatically.

Gherkin in CI/CD Pipelines:
Gherkin scenarios are integrated into modern DevOps pipelines (GitHub Actions, Azure Pipelines, Jenkins) where Cucumber or SpecFlow runs acceptance tests automatically on every pull request, providing living documentation that always reflects the current system behavior.

↥ back to top

Q. What are the roles and responsibilities of a Technical Lead?

A Technical Lead (Tech Lead) is a senior engineering role responsible for guiding the technical direction of a team or project. Unlike a pure individual contributor, the Tech Lead balances hands-on coding with leadership, mentoring, and coordination responsibilities. The role sits at the intersection of engineering and management.

Core Responsibilities:

1. Technical Direction and Architecture

Define and own the technical architecture, design patterns, and technology stack decisions for the project.
Conduct architecture reviews, proof-of-concepts (PoCs), and technology evaluations.
Ensure technical decisions align with business goals, scalability, and maintainability.
Maintain architecture decision records (ADRs) to document key design choices and their rationale.

2. Code Quality and Standards

Establish and enforce coding standards, best practices, and review guidelines.
Lead and conduct code reviews to ensure quality, security (OWASP compliance), and consistency.
Define and maintain Definition of Done (DoD) criteria for technical deliverables.
Introduce and maintain static analysis tools (SonarQube, ESLint, StyleCop) and CI/CD quality gates.

3. Technical Planning and Estimation

Break down high-level requirements into technical tasks and user stories.
Provide effort estimates for complex technical work during sprint planning and PI Planning.
Identify technical dependencies, risks, and constraints, and communicate them to the Project Manager / Scrum Master.
Contribute to the RAID log for technical risks, assumptions, and dependencies.

4. Mentoring and Team Development

Coach and mentor junior and mid-level developers through pair programming, code reviews, and knowledge-sharing sessions.
Identify skill gaps in the team and recommend training or hiring actions.
Foster a culture of engineering excellence, psychological safety, and continuous improvement.
Support team members’ career growth and technical progression.

5. Cross-functional Collaboration

Act as the primary technical point of contact for Product Managers, Business Analysts, Architects, and stakeholders.
Translate business requirements into technical specifications and communicate technical constraints in business terms.
Collaborate with DevOps, Security, QA, and UX teams to ensure integrated delivery.
Participate in or facilitate technical discovery workshops (e.g., Event Storming, Story Mapping).

6. Delivery and Execution

Ensure the team delivers high-quality software on schedule by removing technical blockers.
Balance technical debt management with feature delivery — prioritize refactoring and platform work in the backlog.
Monitor system health via dashboards, alerts, and performance metrics (DORA metrics: Lead Time, Deployment Frequency, MTTR, Change Failure Rate).
Coordinate with the Scrum Master or PM to flag risks that may impact sprint commitments.

7. Security and Compliance

Ensure security best practices are embedded in the development process (shift-left security, SAST/DAST).
Review code and architecture for vulnerabilities aligned with OWASP Top 10.
Ensure compliance with data privacy regulations (GDPR, HIPAA) at the architecture level.

Tech Lead vs. Engineering Manager:

Dimension	Technical Lead	Engineering Manager
Primary focus	Technical excellence + delivery	People management + org health
Coding	Active contributor (30–70%)	Minimal or none
Reports to	Engineering Manager or CTO	VP Engineering or CTO
Hiring	Contributes to interviews	Owns hiring decisions
Performance reviews	Input provider	Owner
Career path	Staff Engineer → Principal → Architect	EM → Director → VP

Modern Tech Lead Skills (2024–2026):

AI tooling adoption — Evaluating, introducing, and governing AI coding assistants (GitHub Copilot, Cursor, Amazon Q) within the team's workflow.
Platform engineering awareness — Collaborating with Platform/DevOps teams on Internal Developer Platforms (IDPs), developer experience, and golden paths.
Distributed systems — Leading technical decisions for cloud-native, microservices, and event-driven architectures (AWS, Azure, GCP).
Inner source / open source — Driving reusable component libraries and contributing to engineering knowledge bases.
FinOps awareness — Understanding cloud cost implications of architectural decisions and working with FinOps practices to optimize spend.

↥ back to top

Q. What is the difference between Agile and Waterfall methodologies?

Agile and Waterfall are two fundamentally different approaches to software project delivery. Choosing between them — or combining them in a hybrid model — depends on the nature of the project, stakeholder needs, and the level of requirements certainty.

Waterfall:
A sequential, linear project management approach where each phase (Requirements → Design → Development → Testing → Deployment → Maintenance) must be completed before the next begins. Requirements are locked upfront and change is costly.

Agile:
An iterative, incremental delivery approach where work is broken into short cycles (sprints/iterations), enabling continuous feedback, adaptation, and delivery of working software frequently.

Comparison:

Dimension	Waterfall	Agile
Approach	Sequential, phase-gated	Iterative, incremental
Requirements	Fixed upfront	Evolving throughout
Delivery	Single delivery at project end	Frequent releases (every 1–4 weeks)
Customer involvement	At start and end	Continuous collaboration
Change tolerance	Low — changes are costly	High — embraces change
Documentation	Heavy upfront documentation	Lightweight, just-enough docs
Team structure	Siloed (BA, Dev, QA in sequence)	Cross-functional, self-organizing
Risk management	Risks identified upfront	Risks surfaced and addressed iteratively
Best suited for	Fixed scope, regulated projects	Complex, innovative, fast-changing products
Examples	Construction, compliance systems	SaaS products, mobile apps, AI systems

Hybrid (Water-Scrum-Fall):
Many enterprise organisations adopt a hybrid model — Waterfall governance (fixed budget, milestones, contracts) wrapped around Agile execution (sprints, backlog, daily standups). This is common in SAFe implementations.

When to choose Waterfall:

Regulatory or compliance-driven projects (FDA, ISO, government contracts)
Projects with fully defined, stable requirements
Fixed-price contracts where scope changes are contractually restricted

When to choose Agile:

Products where requirements will evolve based on user feedback
Innovation, digital transformation, or startup contexts
Teams co-located or comfortable with remote collaboration tooling

↥ back to top

Q. What is a Sprint Retrospective and how is it conducted?

A Sprint Retrospective is a Scrum ceremony held at the end of each sprint where the team inspects how they worked together and identifies improvements for the next sprint. It is a core pillar of Agile's continuous improvement principle.

Scrum Guide Definition:
The Sprint Retrospective is an opportunity for the Scrum Team to inspect itself and create a plan for improvements to be enacted during the next Sprint. It is timeboxed to a maximum of 3 hours for a one-month sprint (proportionally shorter for shorter sprints).

Participants: Scrum Team (Developers + Scrum Master + Product Owner)

Standard Agenda (3 key questions):

What went well? — Practices, tools, or behaviors to keep and reinforce
What didn't go well? — Pain points, blockers, process failures to address
What will we improve? — 1–3 concrete, actionable improvements for the next sprint

Popular Retrospective Formats:

Format	Description
Start / Stop / Continue	What should we start doing, stop doing, and continue doing?
4Ls	Liked, Learned, Lacked, Longed For
Mad / Sad / Glad	Emotional temperature check on the sprint
Sailboat	Wind (what helped), Anchors (what slowed us), Rocks (risks ahead), Sun (the goal)
5 Whys	Root cause analysis for recurring issues
Timeline	Team maps key events of the sprint chronologically to spot patterns

Modern Remote Retrospective Tools:

Miro / FigJam — Virtual whiteboards with sticky note templates
RetroTool / EasyRetro — Dedicated retrospective platforms
Parabol — Free, Agile-first retrospective and standup tool
Confluence / Notion — Document-based async retrospectives for distributed teams

Output: A short list of improvement actions (ideally 1–3) added as user stories or tasks to the next sprint backlog, with clear owners and acceptance criteria.

Common Anti-patterns to Avoid:

Turning the retro into a blame session
Generating a long list of actions with no owners
Skipping the retro when the team is busy
Repeating the same improvements sprint after sprint without follow-through

↥ back to top

Q. What is the Critical Path Method (CPM) in project management?

The Critical Path Method (CPM) is a project scheduling algorithm used to determine the longest sequence of dependent tasks that defines the minimum possible project duration. Tasks on the critical path have zero float (slack) — any delay in them directly delays the project completion date.

Key Concepts:

Term	Definition
Activity	A discrete task with a defined duration
Dependency	Relationship between tasks (FS, SS, FF, SF)
Early Start (ES)	Earliest a task can start
Early Finish (EF)	ES + Duration
Late Start (LS)	Latest a task can start without delaying the project
Late Finish (LF)	LS + Duration
Float / Slack	LF − EF (or LS − ES) — time a task can be delayed without affecting project end
Critical Path	The sequence of tasks with zero float

Steps to calculate the Critical Path:

List all activities and their durations.
Identify dependencies between activities.
Draw the network diagram (Activity on Node or Activity on Arrow).
Forward pass — calculate Early Start and Early Finish for each task.
Backward pass — calculate Late Start and Late Finish from the project end.
Calculate float for each task: Float = LS − ES.
Identify the critical path — all tasks with Float = 0.

Example Network:

Start → A(3d) → B(4d) → D(2d) → End   ← Critical Path (9 days)
              → C(6d) → D(2d)          (C has 0 float too if it joins critical path)

Benefits:

Identifies which tasks drive the project end date.
Helps prioritise resource allocation to critical tasks.
Enables schedule compression via fast-tracking (parallelise tasks) or crashing (add resources).
Provides basis for schedule risk analysis and what-if scenarios.

CPM vs. PERT:

Aspect	CPM	PERT
Duration	Single deterministic estimate	Three-point estimate (O, M, P)
Focus	Schedule optimisation	Schedule uncertainty/risk
Best for	Repetitive, well-understood tasks	R&D, novel, uncertain projects

Modern tooling: Microsoft Project, Primavera P6, Smartsheet, and ProjectLibre all compute the critical path automatically from task dependencies.

↥ back to top

Q. What is Earned Value Management (EVM)?

Earned Value Management (EVM) is a project performance measurement technique that integrates scope, schedule, and cost into a single integrated framework. It provides objective data on how much work has been completed relative to what was planned and what was spent.

Core EVM Metrics:

Metric	Abbreviation	Definition
Planned Value	PV	Budgeted cost of work scheduled to be done by a point in time
Earned Value	EV	Budgeted cost of work actually completed (% complete × Budget at Completion)
Actual Cost	AC	Actual money spent on the work completed so far
Budget at Completion	BAC	Total approved budget for the project

Key Variances:

$\text{Schedule Variance (SV)} = EV - PV$ $\text{Cost Variance (CV)} = EV - AC$

SV > 0 → Ahead of schedule; SV < 0 → Behind schedule
CV > 0 → Under budget; CV < 0 → Over budget

Performance Indices:

$\text{Schedule Performance Index (SPI)} = \frac{EV}{PV}$ $\text{Cost Performance Index (CPI)} = \frac{EV}{AC}$

SPI / CPI > 1.0 → Performing better than planned
SPI / CPI < 1.0 → Performing worse than planned

Forecasting:

$\text{Estimate at Completion (EAC)} = \frac{BAC}{CPI}$ $\text{Estimate to Complete (ETC)} = EAC - AC$ $\text{Variance at Completion (VAC)} = BAC - EAC$

Example:

BAC = $100,000 Project is 40% complete AC = $50,000
EV = 40% × $100,000 = $40,000
PV (planned 50% done) = $50,000
CV = $40,000 − $50,000 = −$10,000 (over budget)
SV = $40,000 − $50,000 = −$10,000 (behind schedule)
CPI = 40,000 / 50,000 = 0.8 (spending $1.25 for every $1 of value delivered)
EAC = $100,000 / 0.8 = $125,000 (project forecast to cost $125K)

Benefits:

Provides early warning of cost and schedule problems.
Enables data-driven forecasting of final project cost and completion date.
Required by US Federal Government contracts and widely used in Defence, Aerospace, and large IT programs.
Integrated into Microsoft Project, Primavera P6, and project portfolio management (PPM) tools.

↥ back to top

Q. What is a Project Charter and what does it contain?

A Project Charter is a formal document that officially authorises the existence of a project and grants the Project Manager the authority to apply organisational resources to project activities. It is typically issued by a project sponsor (senior stakeholder) and serves as the project's constitution.

Key Contents of a Project Charter:

Section	Description
Project Title	Name and unique identifier of the project
Project Purpose / Business Case	Why the project is being undertaken and the problem/opportunity it addresses
Objectives	Specific, measurable goals (aligned to SMART criteria or OKRs)
Scope Summary	High-level description of what is in and out of scope
Deliverables	Key outputs the project will produce
Milestones	Major schedule checkpoints and target dates
Budget Summary	High-level cost estimate and approved budget
Stakeholders	Key stakeholders, their roles, and level of influence/interest
Project Team	Project Manager, Tech Lead, and core team members
Assumptions and Constraints	Factors assumed to be true and fixed limitations
Risks (High Level)	Initial identification of major risks
Success Criteria	How project success will be measured
Approval / Sign-off	Sponsor signature authorising the project

Why the Project Charter matters:

Formally authorises the project — without it, the PM has no mandate to spend budget or assign resources.
Aligns all stakeholders on objectives and scope before work begins.
Serves as a reference document throughout the project to resolve scope disputes.
Required by PMBOK as an output of the Develop Project Charter process (Initiating Process Group).

Project Charter vs. Project Plan:

Aspect	Project Charter	Project Plan
Purpose	Authorise and initiate	Plan and execute
Detail level	High-level	Detailed
Created by	Sponsor + PM	PM + Team
Timing	Before project starts	After charter approval
Length	1–3 pages	Multi-document

↥ back to top

Q. What is the difference between a Product Owner and a Project Manager?

The Product Owner (PO) and Project Manager (PM) are distinct roles that often coexist in organisations running Agile or hybrid delivery models. They have different focuses, accountabilities, and skill sets.

Product Owner:
A Scrum role (defined in the Scrum Guide) responsible for maximising the value of the product by managing and prioritising the product backlog. The PO represents the voice of the customer and business stakeholders.

Project Manager:
A role responsible for delivering the project on time, within scope and budget, managing risks, stakeholders, resources, and governance. The PM is outcome-focused at the project level, not the product level.

Comparison:

Dimension	Product Owner	Project Manager
Primary focus	Product value and backlog	Project delivery and governance
Accountability	What gets built and why	When, how much, who builds it
Framework	Scrum / Agile	PMBOK / PRINCE2 / SAFe / Hybrid
Time horizon	Ongoing (product lifecycle)	Fixed (project start to close)
Stakeholder management	Business + customers	All stakeholders (exec, vendors, team)
Success metric	Business outcomes, user adoption	On-time, on-budget, in-scope delivery
Backlog ownership	Yes — owns and prioritises	No — consumes the schedule
Budget ownership	Typically no	Yes
Reports to	CPO / Head of Product	PMO / Programme Manager

In SAFe (Scaled Agile Framework):
The roles are further separated — the Product Manager owns the Program Backlog (features), while the Product Owner owns the Team Backlog (stories). A Release Train Engineer (RTE) performs the coordination role similar to a Programme Manager.

Overlap in hybrid organisations:
In smaller organisations or hybrid projects, one person may wear both hats. However, the tension between “build the right thing” (PO) and “build the thing right on time” (PM) means separating the roles leads to better outcomes in complex projects.

↥ back to top

Q. What is Change Management in project management?

Change Management in project management refers to the structured process for requesting, evaluating, approving, and implementing changes to a project's agreed scope, schedule, cost, or quality baseline. It prevents unauthorised scope creep while allowing necessary changes to be incorporated in a controlled manner.

Change Control Process:

Change Request (CR) Raised — Any stakeholder identifies a need to change the agreed baseline and submits a formal Change Request document.
Impact Assessment — The PM, Tech Lead, and relevant SMEs assess the impact on scope, schedule, cost, quality, risks, and resources.
Change Control Board (CCB) Review — The CCB (sponsor, PM, key stakeholders) reviews the CR and impact assessment.
Decision — Approved / Rejected / Deferred with documented rationale.
Implementation — Approved changes are planned, resourced, and executed.
Baseline Update — Project baselines (scope, schedule, cost) are formally updated to reflect the approved change.
Communication — All affected stakeholders are notified of the approved change.

Types of Change:

Type	Description
Scope change	Addition, removal, or modification of deliverables
Schedule change	Change to milestones or delivery dates
Cost change	Budget adjustment or reallocation
Quality change	Modification to acceptance criteria or standards
Risk-driven change	Change triggered by a materialised risk or issue

Agile Change Management:
In Agile teams, change is embraced by design — the product backlog is continuously refined and reprioritised. However, formal change control still applies to:

Contract scope and commercial agreements
Regulatory compliance requirements
Programme-level milestones and budget in SAFe PI Planning

Organisational Change Management (OCM):
Beyond project change control, OCM addresses the people side of change — managing resistance, communication, training, and adoption when systems, processes, or structures change. Frameworks include Prosci ADKAR (Awareness, Desire, Knowledge, Ability, Reinforcement) and Kotter's 8-Step Model.

↥ back to top

Q. What are DORA metrics and why do they matter?

DORA metrics (DevOps Research and Assessment) are a set of four key metrics developed by the DORA research team (now part of Google Cloud) that measure the performance of software delivery and operational stability. They are the industry-standard benchmark for engineering team effectiveness.

The Four DORA Metrics:

Metric	Definition	Measures
Deployment Frequency (DF)	How often the team deploys to production	Speed / Throughput
Lead Time for Changes (LT)	Time from code commit to running in production	Speed / Efficiency
Change Failure Rate (CFR)	% of deployments that cause a production failure	Quality / Stability
Mean Time to Restore (MTTR)	Time to restore service after a production failure	Resilience / Recovery

Performance Tiers (DORA 2023 Report):

Tier	Deployment Frequency	Lead Time	Change Failure Rate	MTTR
Elite	On-demand (multiple/day)	< 1 hour	0–5%	< 1 hour
High	Daily to weekly	1 day – 1 week	5–10%	< 1 day
Medium	Weekly to monthly	1 week – 1 month	10–15%	1 day – 1 week
Low	< Once per month	> 6 months	46–60%	> 6 months

Why DORA metrics matter:

Provide objective, team-level engineering performance data to replace subjective opinions.
Correlate directly with business outcomes — elite performers are 2× more likely to meet commercial goals (DORA research).
Help Tech Leads and Engineering Managers identify bottlenecks in the delivery pipeline.
Drive conversations about CI/CD maturity, testing automation, incident management, and technical debt.
Adopted by Google, Microsoft, Amazon, and most mature DevOps organisations as standard OKR inputs.

How to improve DORA metrics:

Metric	Key Improvement Levers
Deployment Frequency	Trunk-based development, feature flags, automated CI/CD pipelines
Lead Time for Changes	Smaller PRs, faster code reviews, automated testing, pipeline optimisation
Change Failure Rate	Shift-left testing, canary/blue-green deployments, SAST/DAST in pipeline
MTTR	Observability (logs, traces, metrics), runbooks, chaos engineering, on-call rotation

DORA + SPACE Framework:
Modern engineering organisations combine DORA (flow and stability metrics) with the SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) to get a more holistic view of developer productivity beyond just deployment speed.

↥ back to top

# 21. SOFTWARE ARCHITECT INTERVIEW QUESTIONS

Q. What is the difference between Software Architecture and Software Design?

Aspect	Software Architecture	Software Design
Scope	High-level structure of the entire system	Low-level, component/module level decisions
Focus	System-wide quality attributes (scalability, availability, security)	Implementation details (classes, algorithms, patterns)
Stakeholders	Business, management, all technical teams	Primarily development team
Changes	Expensive and hard to change	Easier to refactor
Artifacts	Architectural diagrams, ADRs, system blueprints	Class diagrams, sequence diagrams, pseudocode

Key principle: Architecture defines the skeleton of a system; design fleshes it out. A wrong architectural decision is far more costly to fix than a wrong design decision.

↥ back to top

Q. What are the key quality attributes (Non-Functional Requirements) an architect must consider?

Quality attributes drive architectural decisions and trade-offs:

Attribute	Description	Common Tactics
Scalability	Handle increasing load	Horizontal scaling, partitioning, load balancing
Availability	System uptime (e.g., 99.99%)	Redundancy, failover, health checks
Performance	Response time, throughput	Caching, async processing, CDN
Security	Protect data and access	Auth/authz, encryption, input validation
Maintainability	Ease of changes	Modularity, low coupling, documentation
Testability	Ability to verify behavior	Dependency injection, contract tests
Observability	System insight in production	Logging, metrics, distributed tracing
Resilience	Recover from failures	Circuit breaker, retry, bulkhead
Portability	Deploy across environments	Containerization, IaC, 12-Factor App

Trade-offs are inevitable — optimizing for all attributes simultaneously is impossible. An architect must communicate and justify trade-offs clearly.

↥ back to top

Q. Explain Microservices Architecture and when you would NOT use it.

Microservices Architecture decomposes an application into small, independently deployable services, each owning its data and communicating over APIs (REST, gRPC, messaging).

Benefits:

Independent deployment and scaling per service
Technology heterogeneity (polyglot)
Fault isolation
Smaller, focused teams (Conway's Law alignment)

When NOT to use microservices:

Small team / startup — operational overhead outweighs benefits; a well-structured monolith is faster to iterate
Unclear domain boundaries — premature decomposition leads to chatty, tightly coupled services (distributed monolith)
Low operational maturity — microservices require CI/CD, container orchestration, observability, and service mesh
Latency-sensitive workflows — network hops between services add latency compared to in-process calls
Shared database requirements — if services must share a database, you lose the key benefit of independent data ownership

Martin Fowler's advice: Start with a monolith, identify seams, then extract services as boundaries become clear.

↥ back to top

Q. What is the CAP Theorem and how does it influence database selection?

CAP Theorem states that a distributed system can guarantee at most two of the three properties simultaneously:

C — Consistency: Every read receives the most recent write or an error
A — Availability: Every request receives a (non-error) response, without guarantee it's the latest data
P — Partition Tolerance: The system continues to operate despite network partitions

Since network partitions are unavoidable in distributed systems, the real trade-off is CP vs AP:

Choice	Example Databases	Use Case
CP (Consistent + Partition-tolerant)	HBase, MongoDB (strong mode), Zookeeper	Financial transactions, inventory systems
AP (Available + Partition-tolerant)	Cassandra, CouchDB, DynamoDB (eventual)	Social feeds, shopping carts, DNS

PACELC extends CAP by also considering latency vs consistency trade-offs when there is no partition.

↥ back to top

Q. What is Event-Driven Architecture (EDA) and what are its benefits and challenges?

Event-Driven Architecture is a design paradigm where components communicate by producing and consuming events (immutable records of something that happened) via a message broker (Kafka, RabbitMQ, AWS SNS/SQS).

Core patterns:

Event Notification — producer fires event, consumer reacts independently
Event-Carried State Transfer — event contains all data needed by consumer (reduces coupling)
Event Sourcing — system state is derived from an append-only log of events
CQRS — separates read (Query) and write (Command) models

Benefits:

Loose temporal and spatial coupling between services
High scalability and throughput (async processing)
Natural audit trail with event sourcing
Easier to add new consumers without modifying producers

Challenges:

Eventual consistency — consumers may lag behind producers
Debugging complexity — tracing a flow across events requires distributed tracing
Schema evolution — changing event schemas without breaking consumers (use schema registry)
Ordering guarantees — partitioned topics may not guarantee global order
Idempotency — consumers must handle duplicate event delivery

↥ back to top

Q. How would you design a system for 10 million concurrent users?

This is a classic scalability question. A structured answer covers multiple layers:

1. Load Balancing

Global DNS load balancing (Route 53, Anycast)
Regional Layer-7 load balancers (NGINX, AWS ALB)

2. Stateless Application Tier

Horizontally scalable API servers (Node.js, Go)
Session state externalized to Redis
Auto-scaling groups based on CPU/request metrics

3. Caching Strategy

CDN for static assets and edge caching (CloudFront, Fastly)
In-memory cache (Redis/Memcached) for hot data
Cache-aside or write-through patterns

4. Database Tier

Read replicas for read-heavy workloads
Sharding / partitioning for write scalability
Separate OLTP (PostgreSQL) and OLAP (Redshift, BigQuery) stores

5. Asynchronous Processing

Message queues (Kafka, SQS) for non-critical workloads
Background workers for emails, notifications, reports

6. Observability

Centralized logging (ELK Stack)
Distributed tracing (Jaeger, OpenTelemetry)
Dashboards and alerting (Grafana, Prometheus)

7. Resilience

Circuit breakers (Hystrix, Resilience4j)
Graceful degradation and feature flags
Multi-region active-active or active-passive deployment

↥ back to top

Q. What is Domain-Driven Design (DDD) and how does it relate to microservices?

Domain-Driven Design (DDD) is a software development approach that aligns the software model with the business domain using a shared language (Ubiquitous Language) between developers and domain experts.

Core DDD concepts:

Concept	Description
Domain	The business problem space being solved
Ubiquitous Language	Shared vocabulary used in code and conversation
Bounded Context	Explicit boundary within which a model applies
Aggregate	Cluster of domain objects treated as a single unit with a root entity
Entity	Object with a unique identity persisting over time
Value Object	Immutable object defined by its attributes, no identity
Domain Event	Something significant that happened in the domain
Repository	Abstraction for data access
Anti-Corruption Layer (ACL)	Translates between bounded contexts to avoid model leakage

DDD → Microservices mapping:

Each Bounded Context maps naturally to a microservice
Domain Events become the messages exchanged between services
Aggregates define transaction boundaries (don't span aggregate boundaries in a single transaction)

↥ back to top

Q. What are the SOLID principles and how do they apply at the architectural level?

Principle	Statement	Architectural Application
S — Single Responsibility	A module should have one reason to change	Each microservice owns one business capability
O — Open/Closed	Open for extension, closed for modification	Use plugin architectures, feature flags, strategy pattern
L — Liskov Substitution	Subtypes must be substitutable for base types	Enforce API contracts; a service upgrade must not break consumers
I — Interface Segregation	Clients shouldn't depend on interfaces they don't use	Design fine-grained APIs; avoid fat shared libraries
D — Dependency Inversion	Depend on abstractions, not concretions	Use dependency injection; services depend on interfaces, not implementations

At scale, SOLID applies across service and module boundaries, not just within a class hierarchy.

↥ back to top

Q. Explain the Strangler Fig Pattern for legacy system migration.

The Strangler Fig Pattern (Martin Fowler) enables incremental migration from a legacy monolith to a new architecture without a risky “big-bang” rewrite.

Steps:

Place a facade/proxy (API Gateway or reverse proxy) in front of the legacy system
Identify a bounded feature slice to migrate first
Build the new service implementing that feature
Redirect traffic for that feature from the legacy system to the new service via the facade
Repeat for remaining features until the legacy system is “strangled” (replaced entirely)

Benefits:

Zero downtime migration
Each step is independently tested and reversible
Business continuity maintained throughout

Challenges:

Data synchronization between old and new systems during transition
Feature parity validation is complex
The facade can become a bottleneck if not designed carefully

↥ back to top

Q. What is the difference between Orchestration and Choreography in microservices?

Aspect	Orchestration	Choreography
Control	Central orchestrator directs all steps	Each service knows what to do and reacts to events
Coupling	Services coupled to orchestrator	Services coupled only to events
Visibility	Easy to trace flow in one place	Flow is distributed; harder to visualize
Failure handling	Orchestrator manages compensating transactions	Each service handles its own failures
Example	Saga orchestrated via a workflow engine (Temporal, AWS Step Functions)	Saga choreographed via domain events on Kafka
Best for	Complex business workflows with many decision points	Simple, decoupled event flows

Saga Pattern uses both approaches to handle distributed transactions without two-phase commit (2PC), using compensating transactions to undo completed steps on failure.

↥ back to top

Q. How do you approach API design as an architect? What makes a good API?

Characteristics of a well-designed API:

Consistency — uniform naming conventions, error formats, pagination patterns
Backward compatibility — never break existing consumers; use versioning (/v1/, /v2/)
Principle of Least Surprise — behavior should match developer expectations
Idempotency — PUT, DELETE, and retry-safe POST operations should be idempotent
Proper HTTP semantics — correct status codes, verbs, and headers
Pagination and filtering — for collection endpoints; avoid unbounded responses
Security by design — authentication, authorization, rate limiting, input validation at API layer
Documentation — OpenAPI/Swagger specs as the contract; treat docs as code

API styles and trade-offs:

Style	Best For	Drawbacks
REST	CRUD resources, broad client support	Over-fetching/under-fetching
GraphQL	Flexible queries, mobile clients	Complex caching, N+1 queries
gRPC	High-performance internal services	Less browser-friendly, steeper learning curve
WebSockets	Real-time bidirectional communication	Stateful, harder to scale
AsyncAPI / Messaging	Event-driven, async workflows	Eventual consistency, debugging complexity

↥ back to top

Q. What is the 12-Factor App methodology?

The 12-Factor App is a methodology for building cloud-native, scalable, and maintainable software-as-a-service applications:

Factor	Principle
1. Codebase	One codebase tracked in VCS; many deploys
2. Dependencies	Explicitly declare and isolate dependencies
3. Config	Store config in the environment, not code
4. Backing Services	Treat databases, queues, etc. as attached resources
5. Build, Release, Run	Strictly separate build and run stages
6. Processes	Execute the app as stateless processes
7. Port Binding	Export services via port binding
8. Concurrency	Scale out via the process model
9. Disposability	Fast startup, graceful shutdown
10. Dev/Prod Parity	Keep development and production as similar as possible
11. Logs	Treat logs as event streams
12. Admin Processes	Run admin tasks as one-off processes

↥ back to top

Q. What are Architecture Decision Records (ADRs) and why are they important?

An Architecture Decision Record (ADR) is a short document that captures an important architectural decision, including its context, the decision made, and the consequences.

Standard ADR structure (Nygard format):

**ADR-001: Use PostgreSQL as the primary datastore**

**Status**

Accepted

**Context**

We need a relational datastore that supports ACID transactions for our
order management system. The team has strong SQL expertise.

**Decision**

We will use PostgreSQL 15 hosted on AWS RDS with read replicas.

**Consequences**

- Strong consistency and ACID guarantees for order records
- Need to plan for schema migrations (use Flyway/Liquibase)
- Horizontal write scaling requires sharding strategy in the future
- Licensing: open source, no cost concerns

Why ADRs matter:

Create institutional memory — new team members understand why decisions were made
Prevent re-litigating settled decisions
Make trade-offs explicit and reviewable
Can be stored in the codebase (/docs/adr/) and version-controlled alongside the code

↥ back to top

Q. How do you ensure security in a distributed system architecture?

Defense in Depth — apply security controls at every layer:

1. Network Layer

VPC / private subnets for internal services
WAF (Web Application Firewall) at the edge
TLS 1.3 for all in-transit communication
Zero Trust Network Access (ZTNA) — no implicit trust based on network location

2. Identity and Access

OAuth 2.0 + OpenID Connect for user authentication
JWT with short expiry + refresh token rotation
mTLS for service-to-service authentication
Role-Based Access Control (RBAC) or Attribute-Based Access Control (ABAC)

3. Application Layer

Input validation and output encoding (prevent XSS, SQLi)
Parameterized queries / ORM usage
Secrets management (HashiCorp Vault, AWS Secrets Manager — never hardcode secrets)
Dependency scanning (Snyk, Dependabot)

4. Data Layer

Encryption at rest (AES-256)
Data masking and tokenization for PII
Database activity monitoring and auditing

5. Operational

Immutable infrastructure — deploy fresh containers, don't patch running instances
Security scanning in CI/CD pipeline (SAST, DAST, container scanning)
Regular penetration testing
Incident response plan and runbooks

↥ back to top

Q. What is the difference between Horizontal and Vertical scaling?

Aspect	Vertical Scaling (Scale Up)	Horizontal Scaling (Scale Out)
Approach	Add more CPU/RAM/storage to existing machine	Add more machines/instances
Limit	Physical hardware ceiling	Virtually unlimited (cloud)
Cost	Exponential cost growth at high end	Linear cost growth
Downtime	Often requires restart	No downtime with rolling deployment
Complexity	Simple — no code changes	Requires stateless design, load balancing
Best for	Databases (initially), legacy apps	Stateless web/API tiers, microservices

Architectural implication: Design stateless services from the start to enable horizontal scaling. Offload state to distributed caches (Redis) or databases.

↥ back to top

Q. Explain the Circuit Breaker pattern and how it improves system resilience.

The Circuit Breaker pattern (Michael Nygard, Release It!) prevents a failing service from causing cascading failures across a distributed system.

States:

[CLOSED] ──(failures exceed threshold)──► [OPEN] ──(timeout expires)──► [HALF-OPEN]
    ▲                                                                          │
    └──────────────────(probe request succeeds)────────────────────────────────┘

CLOSED — requests flow normally; failure counter is tracked
OPEN — requests fail immediately (fast fail) without hitting the downstream service; error returned to caller
HALF-OPEN — a limited number of probe requests are allowed through to test recovery

Implementation in Node.js (using opossum):

const CircuitBreaker = require('opossum');

const breaker = new CircuitBreaker(callExternalService, {
  timeout: 3000,          // if function takes > 3s, trigger failure
  errorThresholdPercentage: 50,  // open when 50% of requests fail
  resetTimeout: 10000,    // try again after 10s
});

breaker.fallback(() => ({ data: null, source: 'cache' }));
breaker.on('open', () => console.warn('Circuit OPEN — calls short-circuited'));
breaker.on('halfOpen', () => console.info('Circuit HALF-OPEN — probing'));
breaker.on('close', () => console.info('Circuit CLOSED — service recovered'));

Related patterns: Retry with exponential backoff, Bulkhead (isolate failure domains), Timeout (don't wait forever).

↥ back to top

Q. What is CQRS and Event Sourcing? When should you use them?

CQRS (Command Query Responsibility Segregation):

Separates the write model (Commands — mutate state) from the read model (Queries — retrieve state).

Client ──► Command Handler ──► Write DB (normalized)
                                     │
                              (event/projection)
                                     ▼
Client ──► Query Handler ──────► Read DB (denormalized, optimized per use case)

Benefits: Read and write models can scale independently; read models can be tailored for specific query patterns.

Event Sourcing:

Instead of storing the current state, store a sequence of events that led to that state. Current state is derived by replaying events.

events = [
  { type: 'OrderPlaced', orderId: '1', items: [...] },
  { type: 'PaymentReceived', orderId: '1', amount: 99.99 },
  { type: 'OrderShipped', orderId: '1', trackingId: 'XYZ' },
]
// currentState = events.reduce(applyEvent, initialState)

Benefits: Complete audit trail, temporal queries (“what was the state at time T?”), event replay for projections.

When to use:

Complex domains with rich audit requirements (finance, healthcare, compliance)
Systems that need multiple read projections of the same data
Avoid for simple CRUD applications — adds significant complexity

↥ back to top

Q. How would you handle distributed transactions across microservices?

Traditional ACID transactions don't span service boundaries. Distributed transaction strategies:

1. Saga Pattern (recommended)

A sequence of local transactions, each publishing an event/message to trigger the next step. On failure, compensating transactions undo completed steps.

PlaceOrder → ReserveInventory → ProcessPayment → ShipOrder
                                      │ FAIL
                              ◄── ReleaseInventory ◄── CancelOrder (compensate)

2. Two-Phase Commit (2PC) — generally avoided

Requires a distributed transaction coordinator
Blocking protocol — participants lock resources during voting
Coordinator is a single point of failure
Poor performance at scale

3. Outbox Pattern Ensures atomicity between database writes and message publishing:

Write business record + event to an outbox table in the same local ACID transaction
A relay process (CDC via Debezium, or polling) publishes the event to the message broker

4. Idempotent Consumers All event consumers must be idempotent (processing the same event twice produces the same result) to handle at-least-once delivery semantics.

↥ back to top

Q. What is the difference between REST, GraphQL, and gRPC? When do you choose each?

Factor	REST	GraphQL	gRPC
Protocol	HTTP/1.1	HTTP/1.1	HTTP/2
Data format	JSON/XML	JSON	Protocol Buffers (binary)
Schema	OpenAPI (optional)	Strongly typed schema	Strongly typed `.proto`
Fetching	Fixed endpoint, fixed response	Client-specified fields	Fixed RPC methods
Performance	Good	Good (can be N+1 issue)	Excellent (binary, multiplexing)
Browser support	Native	Native	Limited (grpc-web required)
Streaming	Limited (SSE)	Subscriptions	Native bidirectional streaming

Choose REST when: Building public APIs, browser clients, broad interoperability needed.

Choose GraphQL when: Mobile/frontend clients need flexible queries, multiple clients with different data needs, BFF (Backend for Frontend) layer.

Choose gRPC when: Internal service-to-service communication, high-throughput/low-latency requirements, polyglot environments needing strong contracts.

↥ back to top

Q. How do you approach observability in a distributed system?

Observability is built on three pillars:

1. Logs

Structured logging (JSON format) with consistent fields: traceId, spanId, userId, service, level
Centralized log aggregation: ELK Stack (Elasticsearch, Logstash, Kibana) or Grafana Loki
Log levels: ERROR (alerts), WARN (investigation), INFO (audit), DEBUG (development only)

2. Metrics

RED Method: Rate (requests/sec), Errors (error rate), Duration (latency percentiles — p50, p95, p99)
USE Method for infrastructure: Utilization, Saturation, Errors
Tools: Prometheus (collection) + Grafana (visualization)

3. Traces

Distributed tracing propagates traceId across service boundaries
Visualize end-to-end request flow and identify bottlenecks
Tools: Jaeger, Zipkin, AWS X-Ray; instrumented via OpenTelemetry

Implementation in Node.js:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();

4. Alerting

Alert on symptoms (high error rate, high latency) not causes (CPU usage)
SLO-based alerting: alert when error budget burn rate is too high
PagerDuty / OpsGenie for on-call routing

↥ back to top

Q. What is the difference between a Monolith, SOA, and Microservices?

Aspect	Monolith	SOA	Microservices
Deployment unit	Single deployable artifact	Large services (shared ESB)	Many small independent services
Communication	In-process function calls	SOAP/ESB (heavy middleware)	REST/gRPC/messaging (lightweight)
Data	Shared database	Often shared database	Each service owns its data
Team structure	Feature teams span the whole app	Cross-functional but large	Small, autonomous teams (two-pizza rule)
Scalability	Scale everything together	Coarse-grained scaling	Fine-grained scaling per service
Operational complexity	Low	Medium	High
Best for	Early-stage products, small teams	Enterprise integration scenarios	Large-scale, complex domains with many teams

Key insight: Microservices solve organizational complexity (Conway's Law) as much as they solve technical scaling problems.

↥ back to top

Q. How do you handle backward compatibility and versioning in APIs?

Versioning strategies:

Strategy	Example	Pros	Cons
URI versioning	`/v1/users`	Simple, explicit, cacheable	URL proliferation
Header versioning	`Accept: application/vnd.api.v2+json`	Clean URLs	Less discoverable
Query parameter	`/users?version=2`	Easy testing	Unconventional
Content negotiation	`Accept-Version: 2`	REST-purist approach	Less tooling support

Backward-compatible changes (non-breaking):

Adding new optional fields to a response
Adding new optional query parameters
Adding new endpoints

Breaking changes (require new version):

Removing or renaming fields
Changing field types
Altering response structure

Best practices:

Use semantic versioning for APIs
Support the previous major version for a defined deprecation period (e.g., 12 months)
Publish a deprecation timeline and communicate via Deprecation and Sunset HTTP headers
Use consumer-driven contract testing (Pact) to detect breaking changes before release

↥ back to top

Q. What is Infrastructure as Code (IaC) and why is it important for architects?

Infrastructure as Code treats infrastructure provisioning (servers, networks, databases, load balancers) as software — defined in version-controlled files, applied automatically.

Key tools:

Tool	Type	Approach
Terraform	Declarative IaC	Multi-cloud, state-based
AWS CloudFormation	Declarative IaC	AWS-native
Pulumi	Imperative IaC	Real programming languages
Ansible	Configuration management	Agentless, YAML playbooks
Helm	Kubernetes package manager	Templated K8s manifests

Why architects care:

Reproducibility — identical environments across dev/staging/prod
Disaster recovery — rebuild infrastructure from code in minutes
Audit trail — infrastructure changes are reviewed via pull requests
Drift detection — detect manual changes that deviate from declared state
Cost visibility — review and estimate costs before applying changes

↥ back to top

Q. How do you evaluate and select a technology stack for a new project?

A structured evaluation framework:

1. Business constraints

Time to market and team ramp-up cost
Licensing and total cost of ownership
Vendor lock-in risk

2. Technical fit

Performance characteristics match workload (CPU-bound vs I/O-bound)
Scalability model aligns with expected growth
Ecosystem maturity (libraries, tooling, community)

3. Team capabilities

Existing skills reduce ramp-up time (Conway's Law)
Avoid “resume-driven development” — new tech adds risk

4. Long-term considerations

Active maintenance and community (GitHub stars, release cadence)
Corporate backing or foundation governance
Hiring market (can you find engineers?)

5. Proof of Concept

Build a small spike covering the highest-risk technical assumptions
Measure, don't guess — benchmark against actual workload characteristics

Common pitfall: Choosing technology because it's trending rather than because it solves the actual problem better than alternatives.

↥ back to top

Q. What is Zero Downtime Deployment and what strategies enable it?

Zero Downtime Deployment ensures users experience no service interruption during a new version release.

Strategies:

1. Rolling Deployment

Replace instances one at a time
Both old and new versions run simultaneously during the window
Requires backward-compatible changes during transition

2. Blue-Green Deployment

Two identical environments: Blue (current) and Green (new)
Switch traffic at the load balancer level once Green is validated
Instant rollback by switching back to Blue

3. Canary Release

Route a small percentage of traffic (e.g., 5%) to the new version
Monitor error rate, latency, and business metrics
Gradually increase traffic; rollback if metrics degrade

4. Feature Flags

Deploy code to production but gate new behavior behind a flag
Decouple deployment from feature release
Tools: LaunchDarkly, AWS AppConfig, Unleash

Database migration challenge:

Use expand-contract pattern: first add new columns (expand), migrate data, then remove old columns (contract) — across multiple deployments

↥ back to top

Q. How do you design for failure? What is chaos engineering?

Design for Failure Principles:

Assume any component will fail; design the system to degrade gracefully
Set explicit timeouts on all I/O operations
Use health checks and readiness probes (Kubernetes)
Implement bulkheads to isolate failure domains (thread pool isolation)
Plan for cascading failures with circuit breakers
Design idempotent operations to support safe retries

Chaos Engineering (pioneered by Netflix's Chaos Monkey) is the practice of intentionally injecting failures into a system in production (or production-like environments) to proactively discover weaknesses.

Chaos Engineering process:

Define steady state (key business/technical metrics)
Hypothesize: “If we kill this instance, steady state will be maintained”
Inject failure (instance termination, network latency, disk full, CPU spike)
Observe and compare against steady state
Fix discovered weaknesses; repeat

Tools: Netflix Chaos Monkey, Gremlin, LitmusChaos (Kubernetes), AWS Fault Injection Simulator

Key distinction: Chaos Engineering is not random destruction — it is a controlled scientific experiment to build confidence in system resilience.

↥ back to top

Project Management Basics

Table of Contents

Questions

Q. What are the scientific ways to do project estimation?

Q. What is planning poker estimation technique?

Q. What is Ballpark Figures estimate?

Q. What are the tools used for requirements gathering?

Q. Explain the concept of RAID in project management?

Q. What are the techniques used to define the scope of a project?

Q. Explain Ishikawa/Fishbone diagrams?

Q. What is the process of calculating the three-point estimating method?

Q. What is Work Breakdown Structure (WBS)?

Q. What is the Pareto principle analysis?

Q. What is Gherkin approach for writing user stories?

Q. What are the roles and responsibilities of a Technical Lead?

Q. What is the difference between Agile and Waterfall methodologies?

Q. What is a Sprint Retrospective and how is it conducted?

Q. What is the Critical Path Method (CPM) in project management?

Q. What is Earned Value Management (EVM)?

Q. What is a Project Charter and what does it contain?

Q. What is the difference between a Product Owner and a Project Manager?

Q. What is Change Management in project management?

Q. What are DORA metrics and why do they matter?

# 21. SOFTWARE ARCHITECT INTERVIEW QUESTIONS

Q. What is the difference between Software Architecture and Software Design?

Q. What are the key quality attributes (Non-Functional Requirements) an architect must consider?

Q. Explain Microservices Architecture and when you would NOT use it.

Q. What is the CAP Theorem and how does it influence database selection?

Q. What is Event-Driven Architecture (EDA) and what are its benefits and challenges?

Q. How would you design a system for 10 million concurrent users?

Q. What is Domain-Driven Design (DDD) and how does it relate to microservices?

Q. What are the SOLID principles and how do they apply at the architectural level?

Q. Explain the Strangler Fig Pattern for legacy system migration.

Q. What is the difference between Orchestration and Choreography in microservices?

Q. How do you approach API design as an architect? What makes a good API?

Q. What is the 12-Factor App methodology?

Q. What are Architecture Decision Records (ADRs) and why are they important?

Q. How do you ensure security in a distributed system architecture?

Q. What is the difference between Horizontal and Vertical scaling?

Q. Explain the Circuit Breaker pattern and how it improves system resilience.

Q. What is CQRS and Event Sourcing? When should you use them?

Q. How would you handle distributed transactions across microservices?

Q. What is the difference between REST, GraphQL, and gRPC? When do you choose each?

Q. How do you approach observability in a distributed system?

Q. What is the difference between a Monolith, SOA, and Microservices?

Q. How do you handle backward compatibility and versioning in APIs?

Q. What is Infrastructure as Code (IaC) and why is it important for architects?

Q. How do you evaluate and select a technology stack for a new project?

Q. What is Zero Downtime Deployment and what strategies enable it?

Q. How do you design for failure? What is chaos engineering?