It will be a great resource for myself to review from time to time, and it might help other startup company when they are in similar stage and facing similar challenges/issues.
a quick list that I should write about:
- openfire stop doing it's job properly after several hours it get restarted.
- database tuning and changing roadmap.
- database load is heavey for the first time, we upgraded hardware.
- too much symutanious connections on database, what happened? what should we do?
- db7 IO increased exponentially in the last 3 month, even when our traffic stay flat.
- db3 load is too high, mostly due to member browsing queries.
- first time database sharding, a virtical one.
- we have over 4.5M members, and some of them are paying for premium features, we need improve our service: reduce single point of failure: database high availability.
- time to use memcached, and three major ways we use it.
- take advantage of Amazon EC2
- targeting launching speeddate.com in one Month, and we have three issues: security issue(sql injection/xss attack), performance issue: we can only support up to 30 person to be online and dating concurently.
- begin to use js much more than before, is this a good approach? is this a good approach for speeddate.com with only two front end guys?
- controler section re-factoring and what we should avoid next time.
- MMS and openfire got introduced in V4, and what we should avoid next time.
- Do I have the ability to introduce MMS and openfire alike technology into speeddate.com?
- Take advantage of CDN
- Take advantage of Amazon S3
- How we can introduce payment system in two weeks with one person, compare with another company spend 6 months with multiple engineers to launch their payment system.
- Changing webhosing company without downtime.
- Changing load balancer and all servers without downtime.
- Improve team productivity, buy some good books.
- we are using over 70 charts daily, to guide us, guard us.
- How do you know whether you need to add more servers? use munin.
- Alert engineer when something is down, nagios.
- database backup server hard drive full, will the slave on it still be good or corrupted?
- during migration (db7), one database server IO and load is pretty high, after disable slave for a while (24 hours, on db9), enable slave io_thread, cause db7 IO super high, and db7 can not handle site traffic anymore.
- how to identify deers and make them more active so they can attract more hunters to pay for the leson fee to hunt.