beShera is a social EdTech platform built on a microservices architecture. User onboarding, course management, social interactions, examinations, and AI-powered learning features โ all as separate services communicating over APIs.
This was the most complex system I've tested. Here's what I learned.
Why Microservices Testing is Different
In a monolith, data flows through one codebase. You break something, you see it. In microservices, a bug in the notification service might only appear when the exam service fires an event, which only happens under specific user state conditions.
The real bugs are at the integration points.
My Testing Strategy
1. Map the Service Boundaries First
- Before writing a single test case, I spent two days mapping:
- Which service owns which data
- How services communicate (REST, events, both?)
- What happens when one service is slow or down
This map became the source of truth for all my edge-case tests.
2. Contract Testing for APIs
- When Service A calls Service B, I tested:
- Does Service B return what Service A expects?
- What happens when Service B returns an unexpected field?
- What happens when Service B adds a new required field?
I found a real bug here: the exam service expected user_id as an integer, but the user service started returning it as a string after a migration. Tests passed in isolation. The integration broke.
3. End-to-End Critical Paths
I defined 5 critical user journeys and tested them end-to-end:
- New user onboarding โ register โ verify email โ complete profile โ enroll in first course
- Course completion โ watch all modules โ pass exam โ receive certificate
- Social interaction โ post โ comment โ like โ notification delivery
- AI-assisted learning โ start AI session โ complete โ progress saved
- Account recovery โ forgot password โ reset โ re-login โ session intact
Each journey touched 3โ7 microservices. Any one of them could break the entire flow.
4. Chaos Testing (Informal)
- I didn't have access to formal chaos engineering tools, but I simulated failure manually:
- What happens to an active exam session if the network drops mid-test?
- What if the user closes the browser during payment?
- What if the same user logs in from two devices simultaneously?
The simultaneous session test found a real bug: both sessions stayed active, and actions from one session sometimes overwrote the other's progress.
The Bug I'm Most Proud of Finding
During social interaction testing, I found that unliking a post you'd never liked actually decremented the like count below zero. The like count could go to -1, -2, etc.
The root cause: the frontend sent the unlike API call optimistically before confirming whether the user had actually liked the post. The backend didn't validate current like state before decrementing.
Impact: High. A public-facing counter showing negative numbers is both a UX bug and a data integrity issue.
Key Takeaways
- Test services in isolation first, then together โ find the unit bugs before the integration bugs
- Idempotency matters โ every write operation should be safe to call twice
- Event-driven bugs are time-delayed โ not everything shows up immediately
- State management across services is the hardest problem โ test every state transition explicitly

