mark.watero.us

Wordpress stuff, a statistics plugin, and jello

Articles found for the word ‘statistics’

The future of kStats Reloaded statistics for Wordpress

4 comments

I’m not sure how to say this, so I think I’ll just spit it out; there are some major changes on the horizon for kStats. These are good changes, and I will go into more detail farther on, but they are changes that I may negatively impact current users of the plugin. Out of 6,000+ downloads, this could mean anywhere between a 2-3 dozen web sites! </tongue-in-cheek>

Humble Beginnings

As I’ve mentioned a few hundred times, kStats began as a simple fork of StatPress Reloaded to speed things up and create a plugin more suited to larger applications of Wordpress.

Due to the nature of how StatPress chose to store statistics and report on them, a StatPress table had a tendency of growing extremely large, extremely fast. Not only did this approach create a severe bottleneck when visitors tried to access your site, but it was sadly even worse when you tried to retrieve the resulting data on the administrative side.

Due to this, I figured the best approach for kStats was to restructure the existing format to use aggregated data. Much smaller more accessible records that gave you a fast look at real time numbers combined with past totals.

Statistics are important

This approach worked great at first. kStats was fast, it recorded data quickly and it reported it to the site administrator just as fast. But over time, it’s starting to show its weakness. It gives you the numbers, but what about the meat of your stats? What if you want to see what happened last month? What about 3 weeks ago? Or 16 weeks ago?

I started to develop a new aggregate process which would store almost four times as much data, allowing the existing system to remain in place, and the reports to grow in features considerably. I could still sense impending doom with this approach though. What about further down the road? Would there be a ceiling to how much data I could store in an aggregate form? Would I just move the bloat from one storage method to another, winding up in the same pit that StatPress did?

Learning from our mistakes

In the end, I think StatPress was on the right track. I think I was too, but instead of forking down a completely different road, I think the best approach would be to learn a lesson from both approaches, and to turn that into a single new approach.

I was drowning in workload over the holidays, while simultaneously trying to keep up with family functions. I didn’t get a lot of time to work on kStats, and I must apologize to anybody I left out to dry as a result. There were some bug fixes desperately needed, and while they’re out now, they should’ve been out then. However while I didn’t have as much time to work on kStats as I would have liked, I did find some time here and there in the mornings and evenings to get some reading in.

It’s all about the database

I consider myself fairly competent when it comes to MySQL, and more importantly database design. The more I learn though (as with life), the more I realize I don’t know. SQL by itself, without the various layers that bring it to life on the web, is an art form all by itself.

I’ve been studying up on the subject, because I know it’s a very important part of this design process, as well as the design and development of a few other projects I’m working on right now, and I think I’ve drawn an outline for a new database structure that will allow kStats to break the mold a little when it comes to statistics recording and analysis for Wordpress.

Using a combination of MySQL engines, such as the ARCHIVE engine, and a completely redesigned schema, I think I can bring the speed without sacrificing any data. There’s no reason you shouldn’t be able to look up historical periods of time and see exactly what occurred. There’s no reason you shouldn’t be able to click a button and produce a detailed report using any piece of recorded data as the focus (as opposed to being restricted to predefined reports). This is what statistics are for after all, right?

The bad news

While the plugin is still technically considered a beta, I would rather not release a version that destroys all the data that’s been recorded to date. This may be unavoidable to some degree though.

While data that currently exists in the raw table of your kStats install is easy enough to move over to the new format with no data loss, the data that’s already been summarized might not be so easy to transfer.

Before I dive headlong into this new strategy I’m going to do everything in my power to determine how we can avoid such a loss. It may be as easy as providing a legacy utility which will store the historical data in a different format, and retrieve it when building certain reports. The problem I’m facing is that this aggregate data may be unusable in producing certain reports that require a particular level of accuracy.

It’s tough. I have a project board dedicated to brainstorming this, so I’ll be sure to keep everybody up to date on how it’s going to go down, well before it does.

Written by mark

December 29th, 2009 at 8:02 pm

Data loss during nightly cleanup

5 comments

I’ve had reports of it for many releases now – data is lost from the totals table, for no apparent reason. I’ve seen it in the search terms people use to find my site and the kStats plugin, and I’ve had bug reports and numerous comments about it. Until now, I’ve never seen it happen and therefore had no basis to start my search in fixing it.

Well, after an entire week of testing 0.6.0 every day on my own blog to try and have kStats first major bug free release, I thought I was good to go. So I tagged it and dropped it in the repository earlier this evening, thinking “Finally!”.

Not an hour later, it happened.

My data was gone. All except for yesterday and the new hits coming in after midnight. Nothing was missing from the raw table or charts, something happened to truncate the totals (aggregate) information. The problem is, I still have no better clue what it was. It happened right after the wp-cron event for kStats was tripped, so I presume it has to be occurring during the cleanup routine. So I went in and debugged every query that was going on.

Every single query ran fine. I tripped the cron 7 times. Each and every time worked perfectly. So what the HECK is going on?

I’ve been banging my head against the walls and my dining room table for a few hours now trying to track it down. What good is a stats plugin that arbitrarily truncates data?! The only possibility I’ve come up with so far that seems remotely feasible is that it happens when a search engine or bot trips the cleanup – maybe they’re hitting a page and leaving so fast that it’s only getting part way into the process before it gets canceled?

So I just finished tagging 0.6.1, hopefully before too many people have already upped to 0.6.0. I added (and probably should have since day one anyways) ignore_user_abort() to the primary cleanup function to ensure that even if the page is exited before the script is done, it will follow through. I had this in the collector, I’m not sure why I didn’t put it there before.

If that doesn’t fix the problem, I’m going to turn the cron feature off and make it a utility on the Options page until the reason is tracked down. I would like to release a stable version. I can’t until this is worked out.

*sigh*

Written by mark

November 24th, 2009 at 2:04 am

Posted in Plugins, kStats Reloaded

Tagged with , , ,

kStats 0.4.6 – statistics reloaded

one comment

It’s been a little while longer since my last release than I had anticipated, so once again I’ve made myself feel a little rushed on getting this out there.

There’s not a whole lot of fancy new stuff here, mostly bug fixes and some minor upgrades to the interface to allow for a little more user customization. One of the biggest changes you may note is how the statistics overview is organized. There are four new color coded dialogs at the top which display the all time total for each area of interest (visitors, pageviews, spiders, etc) along side the daily current total.

By removing this from the table, I’ve cleared up some room to include the last few months aggregate data, but chose to continue displaying today’s and yesterdays information here as well. Eventually the totals may be split up by months and days in two seperate tables, but for now I decided to focus on the monthly data here, while letting the bar chart handle the majority of the daily totals.

New Options

You now have more control over how many recent hits kStats will display. It defaults to 20, though you can set it up anywhere as high as 500.

The ignore list is now configurable through the administrative options page. The old way of directly editing the ignore.dat definitions file seemed too obscure and this allows for much easier control.

Top 20 Charts

The top 20 charts are no longer displayed side by side, to accomodate smaller monitors with less room to try and fit all that information. In addition, you can now select the ‘view all’ option to see all the stored information for each area.

Feedback, as always…

…is appreciated! I’d like to know what changes you like, any that you don’t, and suggestions for future releases.

Somehow my schedule has gone from ‘late night’ to ’senior citizen’ in the past couple weeks, and I’m about ready to pass out, so I’ll check my grammar in the morning.

Written by mark

November 14th, 2009 at 2:00 am

kStats for Wordpress version 0.4.1

one comment

kStats Reloaded 0.4.1 for Wordpress is now available for download!

This version finally brings a lot of stability to the platform! A variety of bugs have been fixed, including any probability of fatal errors (I’ve been running it non-stop since last night in full debug to catch any possibles) that may have caused serious issues for people who have downloaded previous beta versions of the plugin.

View the changelog for the short list.

New Interface

The statistics reporting and general administration interface has gone through a complete overhaul. I’ve added a new tabbed navigation system to help organize the data in easy to use groupings, and included a new options page which will grow with the plugin.

Along with the new interface and due to the administrative options page I decided to go ahead and integrate the StatPress Reloaded conversion script. This way people upgrading from StatPress will now find it easy to convert all their data into kStats. If you don’t need to convert, or have already done so, simply delete the file /kstats-reloaded/lib/convert.php and the option removes itself.

Using a similar style interface, there is now an integrated upgrade process that will degrade gracefully all the way back to 0.1.0. Now no matter what version you started with, you will always be able to update to the latest with minimal fuss.

Database Changes

I completely restructured the way the aggregate tables store information. This has not only improved the performance of the plugin during regular operation, but it also allows for much more scalability and future options, including the ability to go back a year and view your monthly statistics, or go back a few months to view daily/weekly activity.

The potential here is to continue in the direction of kStats not only being the fastest statistics recording plugin available, without sacrificing any data.

Nightly cleanup

Since version 0.1.0, I’ve been reinventing my own wheel, and with little success ironically. My nightly aggregate function was a complete rewrite of another function I was already using to gather data from the aggregate tables and raw tables to display real-time information. The problem was that I spent too much time focusing on how to make sure all the new data was dropped in the right spots that I completely saw past the possibility of using the same function (with slight additions) for both purposes.

This has been fixed in 0.4.1, and the nightly cleanup is now not only working flawlessly (I say that with great trepidation, mixed with confidence), but has been clocked on various test runs with tables up to 30,000 rows in well under half a second. That means whomever trips the nightly cleanup shouldn’t half to blink twice before the page loads.

Odds and ends

I’ve combed over every file and every function – there’s a few that I’ve deprecated in favor of better more versatile functions. There was a bug where stats were being misreported at night between 12:00am and the cleanup being triggered due to the fact that the datetime class was reporting the wrong days for a period of an hour, but it has been fixed as well.

Let me know what you think! I feel good about the path from here to version 1.0.0 myself, but what I think doesn’t matter anywhere near as much as what you think!

Written by mark

November 3rd, 2009 at 4:36 pm

Statistics Analysis for Wordpress, feature requests?

15 comments

On the road to releasing version 1.x.x of kStats Reloaded for Wordpress, I would like to hear back from the community on what features they like and don’t like in a statistics analysis plugin.

A friendly spider

a friendly spider...

Pageviews, spiders, search terms, mumble, mumble…

We’ve all used them, they’re great for a multitude of purposes on a variety of levels. Some people just like to know that they’re getting hits, and how many. Some people are trying to make a living from their online endeavours, such as blogging, and need to know who is viewing their content, where they came from, and how they’re getting around your site, in order to provide this information to potential advertisers or investors.

Some are extremely fast, but may not offer the variety of information that’s being sought out. Some record literally every last detail they can squeeze out, but may store this information in a format that slows your site down to a crawl when combined with other plugins or dynamic content.

Give me your feedback!

What information do you find the most useful? What features just get in your way? Post your comments below, and I’ll review every one for possible inclusion into a future release of kStats.

(kStats Reloaded can be found here.)

Written by mark

October 30th, 2009 at 5:30 pm