I am doing internship in an advertising company, i already implemented a tool to gather all the necessary data form facebook & import them on a database.
Now i am trying to manipulate that data, first by making some test cases & getting some results. The tables grow by 35k rows per day so after a month of using the tool i noticed that the query i use to obtain the sum of certain adcreatives clicks is starting to slow down.
i am asking if the query i use can speed up if i use it with a join & how.
here is the query i have for the sum of clicks per adcreative (with adgroup_id,campaign_id as connect to the other tables):
<!-- language-all: lang-sql -->
SELECT t1.adgroup_id, t1.campaign_id, t1.creative_ids, SUM( t2.clicks ) AS clicks
FROM adgroups t1, adgroup_stats t2
WHERE t1.adgroup_id = t2.adgroup_id
GROUP BY t1.creative_ids
ORDER BY clicks DESC
currently the query takes 3 secs to complete on a dedicated server, i guess after 6 months it will be at more than 60 secs or so as the tables grow.
edit: here is the explain of the query ( although this is the first time i actually use it & not so sure what it means)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 671549 Using temporary; Using filesort
1 SIMPLE t1 ref PRIMARY PRIMARY 8 fbads.t2.adgroup_id 358 Using index
That looks like a full table scan, & with that rapid growth small performance changes won’t make a huge difference on the long run. You need a different approach.
I would calculate aggregates for the previous months (days, etc) with a cron job, & when you need stats then merge that with the fresh results (using the query you already wrote). That why you only have to scan the fresh record, which means the queries is going to be fast.
Alternatively, you can keep up-to-date counters in the adgroups table, & update them on each click. Not sure if mysql is the right tool for this, I can recommend MongoDB, it can do very swift atomic increments on fields, & though it doesn’t donate you as strict guarantees (ACID) as a relational database, in this case it’s not a problem, ad clicks aren’t mission critical data, nobody is going to complain, if you lose < 0.01% percent of click information.