mark.watero.us

Wordpress stuff, a statistics plugin, and jello

Articles found in the ‘kStats Reloaded’ pigeonhole

The future of kStats Reloaded statistics for Wordpress

4 comments

I’m not sure how to say this, so I think I’ll just spit it out; there are some major changes on the horizon for kStats. These are good changes, and I will go into more detail farther on, but they are changes that I may negatively impact current users of the plugin. Out of 6,000+ downloads, this could mean anywhere between a 2-3 dozen web sites! </tongue-in-cheek>

Humble Beginnings

As I’ve mentioned a few hundred times, kStats began as a simple fork of StatPress Reloaded to speed things up and create a plugin more suited to larger applications of Wordpress.

Due to the nature of how StatPress chose to store statistics and report on them, a StatPress table had a tendency of growing extremely large, extremely fast. Not only did this approach create a severe bottleneck when visitors tried to access your site, but it was sadly even worse when you tried to retrieve the resulting data on the administrative side.

Due to this, I figured the best approach for kStats was to restructure the existing format to use aggregated data. Much smaller more accessible records that gave you a fast look at real time numbers combined with past totals.

Statistics are important

This approach worked great at first. kStats was fast, it recorded data quickly and it reported it to the site administrator just as fast. But over time, it’s starting to show its weakness. It gives you the numbers, but what about the meat of your stats? What if you want to see what happened last month? What about 3 weeks ago? Or 16 weeks ago?

I started to develop a new aggregate process which would store almost four times as much data, allowing the existing system to remain in place, and the reports to grow in features considerably. I could still sense impending doom with this approach though. What about further down the road? Would there be a ceiling to how much data I could store in an aggregate form? Would I just move the bloat from one storage method to another, winding up in the same pit that StatPress did?

Learning from our mistakes

In the end, I think StatPress was on the right track. I think I was too, but instead of forking down a completely different road, I think the best approach would be to learn a lesson from both approaches, and to turn that into a single new approach.

I was drowning in workload over the holidays, while simultaneously trying to keep up with family functions. I didn’t get a lot of time to work on kStats, and I must apologize to anybody I left out to dry as a result. There were some bug fixes desperately needed, and while they’re out now, they should’ve been out then. However while I didn’t have as much time to work on kStats as I would have liked, I did find some time here and there in the mornings and evenings to get some reading in.

It’s all about the database

I consider myself fairly competent when it comes to MySQL, and more importantly database design. The more I learn though (as with life), the more I realize I don’t know. SQL by itself, without the various layers that bring it to life on the web, is an art form all by itself.

I’ve been studying up on the subject, because I know it’s a very important part of this design process, as well as the design and development of a few other projects I’m working on right now, and I think I’ve drawn an outline for a new database structure that will allow kStats to break the mold a little when it comes to statistics recording and analysis for Wordpress.

Using a combination of MySQL engines, such as the ARCHIVE engine, and a completely redesigned schema, I think I can bring the speed without sacrificing any data. There’s no reason you shouldn’t be able to look up historical periods of time and see exactly what occurred. There’s no reason you shouldn’t be able to click a button and produce a detailed report using any piece of recorded data as the focus (as opposed to being restricted to predefined reports). This is what statistics are for after all, right?

The bad news

While the plugin is still technically considered a beta, I would rather not release a version that destroys all the data that’s been recorded to date. This may be unavoidable to some degree though.

While data that currently exists in the raw table of your kStats install is easy enough to move over to the new format with no data loss, the data that’s already been summarized might not be so easy to transfer.

Before I dive headlong into this new strategy I’m going to do everything in my power to determine how we can avoid such a loss. It may be as easy as providing a legacy utility which will store the historical data in a different format, and retrieve it when building certain reports. The problem I’m facing is that this aggregate data may be unusable in producing certain reports that require a particular level of accuracy.

It’s tough. I have a project board dedicated to brainstorming this, so I’ll be sure to keep everybody up to date on how it’s going to go down, well before it does.

Written by mark

December 29th, 2009 at 8:02 pm

kStats bugfix release 0.7.4

one comment

It’s been awhile since I’ve written anything on here or been able to do much work on kStats, and I must apologize for the absence. The holiday season came, and paid projects piled up for a short period that kept me far away from getting some much needed updates out.

Now that the holidays are nearing an end, I’m still finding myself under a heavy workload. While this is of course a priority to put some food on the table, I am going to do my best to balance some free time in to continue development of kStats – I have some big plans that I hope to start implementing soon, of which I will write more about in a future post.

The fixes

I have a new permanent note written on my project board; ‘Check compatibility’. I chose with the last release to make use of PHP’s built in filter_var() functionality, in lieu of reinventing the wheel for a validation routine. This method however wasn’t introduced until PHP 5.2.0, and caused the plugin to not function properly for anybody using it previous to that release. Fixed.

There was also a problem with the charts not displaying correctly on new installs. Despite the setting on the options page, a chart would only display as many days as it had data for, and didn’t fill in the blanks. This has been fixed, and should now display a full chart, regardless of data for a particular day.

Download Here   Changelog

Written by mark

December 29th, 2009 at 7:17 pm

Posted in Plugins, kStats Reloaded

Tagged with , ,

5,000 downloads, and a little mod-security

leave a comment

kStats Reloaded just passed 5,000 downloads!

Thank you to everyone who has been part of the development, by supporting the plugin through using it, writing about it in your blogs, sending me feedback via comments, email or bug reports, and anything I missed mentioning! It’s been a lot of fun, and I hope to see it continue growing in popularity.

I’ve got a list a mile long of new features that are up and coming, and I’m currently working on yet more improvements to the database structure as we speak in further attempts to ensure that this is not only an accurate and feature rich plugin, but a lil’ speed demon too. Again, thank you all, and don’t be shy about sending me your criticisms or suggestions!

Mod What?

I just recently installed Mod Security on my server in an attempt to reduce the number of attacks on my blog and the ridiculous referrer spam that I’ve been getting lately. I’m sorry, but I have no interest in taking a screenshot of the latest kStats and showing everybody that 5 out of ten of my top referrers are coming from some stupid subdomain of a**f***d****.com (you’ll notice the most recent screenshot has the top referrers chart collapsed for a reason).

While it deals mainly in HTTP and regular expressions, of which I’m familiar with, the syntax is completely new to me. I hope I haven’t turned on any rules that result in any odd behaviour for anyone; If you notice anything out of the ordinary while trying to perform common tasks such as leaving comments, please let me know so that I can fix it asap. I already had to rewrite a few rules because I managed to make it believe using phpMyAdmin was some form of attempted SQL injection attack…

Written by mark

December 4th, 2009 at 12:46 am

HTML Entities bugfix 0.7.2

leave a comment

As I was writing my kStats introductory post for 0.7.1 I concurrently received a bug report which should be fixed now.

The problem was with the htmlentities() php function I was using — all information coming from the database should be trustworthy, due to sanitization on the way in. However I figured it couldn’t hurt to wrap it on the way out again, and make sure on both sides of the equation.

Since PHP 5.2.3 htmlentities() has allowed for a fourth argument, which if set to false won’t encode already encoded html entities. By running this on data to be displayed I figured it would help catch any mistakes that slipped by on the way in and ensure no malicious javascript could be injected into your dashboard. The problem is I forgot to read the changelog on the function and didn’t realize at first that it was only available on 5.2.3 and up, causing an error to be displayed for anybody running an earlier version.

The wrapper has been updated with a version check. If you’re running 5.2.3 and up it runs with the flag set. If you’re running an earlier version, it simply decodes the string first then encodes it again to make sure all html entities are caught.

Remember to upgrade your copy of PHP, or harass your sysadmin to do so for you! Not just to cover my blind spots (though it doesn’t hurt!) but for the sake of your own security. Keep up to date. (Disclaimer: I realize this responsibility is most often supposed to be that of the host. Hosts, despite providing otherwise exceptional service, can be dinosaurs when it comes to upgrading. Harass them.)

Thanks for catching that one Jake.

Download

Written by mark

December 2nd, 2009 at 8:03 pm

Posted in Plugins, kStats Reloaded

Tagged with , ,

Asynchronous and kStats; delivering fast statistics

one comment

I don’t know why, but this blog has been really hard to write. Could be the fact that I’m still extremely sore from ripping the garage apart and cleaning it top to bottom, or the fact that I’m bummed out about my new intake for my car not being in the mail today, but I just don’t find writing easy at the moment. So I’ll just try and spit it out, and eventually it will get lost in my archives anyways…

What’s new in 0.7.1?

You won’t notice any major visual changes or fancy new features in this release. I fixed a possible vulnerability in the way that some of the data was stored and retrieved and added a new opt-in program which benefits the plugin and another program, both of which I’ll go into further detail on below.

I did however bump up the versioning from 0.6.x to 0.7.x because there’s something new going on behind the scenes that will be a long term benefit to kStats and the people who use it on their blogs.

The Old Way

The aggregate is tripped every night by somebody visiting your web site. Long story short, this would be better accomplished via a cron process run directly off the server, but due to the nature of plugins and Wordpress, expecting a user to set such a thing up just to use kStats would be asking a little too much.

When the aggregate was tripped, previous to this release, the process would run fast as fast can be and sort your data from the raw table into the seperate totals and charts tables. This of course allows kStats to run faster on a regular basis, and store more information with a much smaller footprint than its predecessor did. The pitfall was that the poor sap who tripped the process had to wait anywhere from 1-3 seconds extra for their page to load (possibly even longer on high traffic web sites).

In this age of broadband expectations, 3 seconds is an eternity.

The New Way

kStats now uses what is called an Asynchronous HTTP Request to run the aggregate. When the scheduled time comes, kStats fires off an HTTP request to an interface that runs the whole process in the background. This means that poor sap we were talking about above no longer notices a delay in their page load, no matter what the size of your database is or how much traffic you’re getting.

I promised when I started this project that the primary focus, regardless of features and capabilities, was to bring you the fastest plugin I could. I believe this update goes a long way to solidifying the groundwork of that promise.

Odds and Ends

There’s a new opt-in program that can be found on the Options page under the Definitions Utility – while I’m still looking for a more reliable Geolocation API (hint, hint), the user agent facility (determining OS, Browser, etc) is powered by the API provided by user-agent-string.info.

Should you choose to participate, what happens is when kStats stumbles across a user agent it can’t identify, it will immediately fire it off to user-agent-string.info so that they can identify it and include it in the next update of their API. The more user agents we can identify, the more accurate the process will be in determining exactly what people are using when they visit your site.

In addition, a possible security vulnerability has been closed up in the way that some data was being stored and returned from the database. The upgrade process will clean your current database and all information entered from now on is completely verified and sanitized. Please note that this was not an SQL injection vulnerability but instead a much smaller XSS vulnerability.

Download Changelog

Written by mark

December 2nd, 2009 at 7:19 pm