Here are five basic steps I use before I let myself get fancy.
1. Start with the symptom, not the guess
Write down exactly what’s wrong:
- What URL / service is failing?
- What’s the exact error message or status code?
- Is it slow, broken, or returning the wrong data?
Getting crisp on the symptom keeps you from chasing ghosts and lets you test whether you’ve actually fixed the problem.
2. Check the basics: power, network, and resources
The unglamorous stuff solves more issues than we admit:
- Is the server up? Can you ping/SSH/RDP into it?
- CPU / RAM / disk: Are you pegged at 100% or out of space?
- Network: Any recent firewall, VLAN, DNS, or load balancer changes?
Half of “mysterious” errors boil down to “the box is out of something.”
3. Look at logs where the error actually appears
Don’t grep the universe. Start as close to the symptom as possible:
- Web server logs (Nginx, Apache, IIS)
- App/service logs
- System logs (
journalctl, Event Viewer, etc.)
You’re looking for patterns: timestamps that match, repeated stack traces, or a specific component failing over and over.
4. Reproduce in the smallest way possible
Try to trigger the issue with the least moving parts:
- Use
curlor Postman instead of the full app. - Hit the backend directly instead of going through all proxies.
- Try from another machine or network to rule out client issues.
If you can’t reproduce it on demand, you’re debugging stories instead of behavior.
5. Change one thing at a time and write it down
As you test fixes:
- Change one variable at a time (config, version, setting).
- Note what you changed and the result.
- Roll back when something makes it worse.
This turns troubleshooting from panic into a mini-experiment—and it makes the postmortem way easier.