Saturday, February 28, 2009

Why I write this blog

After staying at speeddate.com for 1.5 years, I have a feeling that I should write down technical things I experienced, learned, implemented before I forgot.
It will be a great resource for myself to review from time to time, and it might help other startup company when they are in similar stage and facing similar challenges/issues.

a quick list that I should write about:
  • openfire stop doing it's job properly after several hours it get restarted.
  • database tuning and changing roadmap.
  • database load is heavey for the first time, we upgraded hardware.
  • too much symutanious connections on database, what happened? what should we do?
  • db7 IO increased exponentially in the last 3 month, even when our traffic stay flat.
  • db3 load is too high, mostly due to member browsing queries.
  • first time database sharding, a virtical one.
  • we have over 4.5M members, and some of them are paying for premium features, we need improve our service: reduce single point of failure: database high availability.
  • time to use memcached, and three major ways we use it.
  • take advantage of Amazon EC2
  • targeting launching speeddate.com in one Month, and we have three issues: security issue(sql injection/xss attack), performance issue: we can only support up to 30 person to be online and dating concurently.
  • begin to use js much more than before, is this a good approach? is this a good approach for speeddate.com with only two front end guys?
  • controler section re-factoring and what we should avoid next time.
  • MMS and openfire got introduced in V4, and what we should avoid next time.
  • Do I have the ability to introduce MMS and openfire alike technology into speeddate.com?
  • Take advantage of CDN
  • Take advantage of Amazon S3
  • How we can introduce payment system in two weeks with one person, compare with another company spend 6 months with multiple engineers to launch their payment system.
  • Changing webhosing company without downtime.
  • Changing load balancer and all servers without downtime.
  • Improve team productivity, buy some good books.
  • we are using over 70 charts daily, to guide us, guard us.
  • How do you know whether you need to add more servers? use munin.
  • Alert engineer when something is down, nagios.
  • database backup server hard drive full, will the slave on it still be good or corrupted?
  • during migration (db7), one database server IO and load is pretty high, after disable slave for a while (24 hours, on db9), enable slave io_thread, cause db7 IO super high, and db7 can not handle site traffic anymore.
  • how to identify deers and make them more active so they can attract more hunters to pay for the leson fee to hunt.