This was exciting. After years of neglect, I have decided to examine, who is registering on my WordPress site, and what is that they are doing. I knew that most of them are just a result of robot activities, and they are just a dead weight in the database (useless users). However, what about the active users, who post in forums, and leave comments, and return regularly? Let me identify and cherish those.
I knew I needed a monstrous SQL statement to link several files, and to count various record types, and sort all this in alphabetic order. So the plan was to learn the WordPress (WP) table structure, to start with just user table and slowly add one table at a time be able test my progress with each and every step.
First, what are the tables in WP? What fields are stored in those tables, and how these tables linked together?
=01= Tables in WP database
show tables; wp_commentmeta wp_comments wp_links wp_options wp_postmeta wp_posts wp_term_relationships wp_term_taxonomy wp_terms wp_usermeta wp_users
We should start exploring 3 tables: wp_users, wp_usermeta and wp_posts.
=02= Fields: users and posts
SELECT column_name FROM information_schema.columns WHERE table_name = 'wp_users'; |
SELECT column_name FROM information_schema.columns WHERE table_name = 'wp_posts'; |
column_name ------------------- ID user_login user_pass user_nicename user_email user_url user_registered user_activation_key user_status display_name |
column_name --------------------- ID post_author post_date post_date_gmt post_content post_title post_excerpt post_status comment_status ping_status post_password post_name to_ping pinged post_modified post_modified_gmt post_content_filtered post_parent guid menu_order post_type post_mime_type comment_count |
- - - - - - - - - - |
|
SELECT column_name FROM information_schema.columns WHERE table_name = 'wp_usermeta'; |
|
column_name ----------- umeta_id user_id meta_key meta_value |
What type of entries are stored in your Posts table?
=03= Type of posts
select post_type, count(*) from wp_posts group by post_type; post_type | count(*) -------------------+--------- attachment | NNNN forum | NN nav_menu_item | NN page | NN post | NNN reply | NNNN revision | NNN topic | NNN
I discovered that bbPress items identified in wp_posts table as either ‘topic’ or ‘reply’. Moving one-step forward, I am now ready to list all the users, who ever started a new topic, or replied to an existing topic.
=04= Forum posters
SELECT SQL_CALC_FOUND_ROWS a.ID, a.user_login, LEFT(a.user_email,18) AS EMAIL, LEFT(a.display_name,12) AS NAME, count(*) AS COUNT from wp_users a JOIN wp_posts b on a.ID = b.post_author where b.post_type in ('reply', 'topic') group by a.ID, a.user_login, a.user_email, a.display_name ORDER BY a.user_login; SELECT FOUND_ROWS(); ID | user_login | EMAIL | NAME | COUNT -----+----------------+--------------------+--------------+------ NNN | AAAA | xxx@xxxxxxx.com | | NN NN | AAAB | yyy@yyyyyy.com | | N
This is already a very powerful statement, but it is time to define exactly, what is that I am looking for? I need a list of users, who either:
– changed their default password, or
– left a comment, or
– posted in forums, or
– posted an article.
That will involve linking 3 or 4 tables together. Let us start with a challenging task of identifying the most humble set of users, who just changed their default password.
What information is stored in wp_usermeta? Let’s see information for just one user:
=05= Info in wp_usermeta
SELECT * FROM wp_usermeta WHERE user_id = 2775; meta_id | user_id | meta_key | meta_value --------+---------+----------------------+---------------------- 39171 | 2775 | first_name | 39172 | 2775 | last_name | 39173 | 2775 | nickname | JonhnDow 39174 | 2775 | description | 39175 | 2775 | rich_editing | true 39176 | 2775 | comment_shortcuts | false 39177 | 2775 | admin_color | fresh 39178 | 2775 | use_ssl | 0 39179 | 2775 | show_admin_bar_front | true 39180 | 2775 | wp_capabilities | a:1:{s:10:"subscriber 39181 | 2775 | wp_user_level | 0 39182 | 2775 | signup_ip | 165.231.4.122 39183 | 2775 | default_password_nag | 1
The meta key value that we are looking for is default_password_nag, and are interested to identified users who do not have key, or it is not ‘1’.
=06= Password changers
This SQL statement will list all your WP users who ever changed their default password:
SELECT SQL_CALC_FOUND_ROWS ID, user_login, SUBSTR(meta_value,1,30) AS "First Name" FROM wp_users join wp_usermeta on id = user_id AND meta_key = 'first_name' WHERE ID not in (select user_id from wp_usermeta where meta_key = 'default_password_nag' AND meta_value = '1') ORDER BY user_login; SELECT FOUND_ROWS(); ID | user_login | First Name -----+----------------+------------ NNN | xxxxxx | Xxxxxxx NNN | yyyyyyy | Yyyyy
As you can see, another potential idea could be to examine all users, who bothered to set their First or Last names fields. But this would not be as reliable. You could be a regular user with short password and with a bank First name field, but it is much harder to be regular user with First name, but with impossible sFkdfj435dkVoox password.
Next, let’s analyze the table storing users comments.
=07= Fields in wp_comments
SELECT column_name FROM information_schema.columns WHERE table_name = 'wp_comments'; column_name -------------------- comment_ID comment_post_ID comment_author comment_author_email comment_author_url comment_author_IP comment_date comment_date_gmt comment_content comment_karma comment_approved comment_agent comment_type comment_parent user_id
Looks like we would need to use field comment_author to link to wp_users. Now we are ready to discover all the users, who posted a comment under their login name ever.
Let us start with the list of user ID(s) stored in wp_comments.
=08= Commenting users
SELECT user_id, count(*) from wp_comments GROUP BY user_id; user_id | count(*) --------+--------- 0 | NNN NN | NN NNN | N
One thing to remember here is that users, who never register write the most comments. For these comments user ID field will be blank.
SELECT a.ID, a.user_login, a.display_name, LEFT(MAX(d.comment_date),10) AS "Com Date", MAX(d.comment_id) AS CID, (select CONCAT (e.comment_post_ID, ' ', SUBSTR( TRIM( e.comment_content ), 1, 5)) from wp_comments e where e.comment_id = MAX(d.comment_id)) AS "PID+COMMENT", count(*) AS CNT FROM wp_users a join wp_comments d on ID = user_id GROUP BY a.ID, a.user_login, a.display_name ORDER BY a.user_login;
This well-formatted SQL statement (above) will return these results:
ID | user_login | display_name | Com Date | CID | PID+COMMENT | CNT ----+-------------+----------------+------------+------+-------------+---- NNN | XXXXX | Xxxx Xxxxxx | 20YY-MM-DD | NNNN | NNNN ????? | N NNN | yyyyyyyy | Yyyyyyyy Yyy | 20YY-MM-DD | NNNN | NNNN Thank | NN NNN | zzzzz | Zzzzz Zzzzz | 20YY-MM-DD | NNNN | NNN Looks | N
Any questions?
Com Date – We only output first 10 characters of the date – LEFT(MAX(d.comment_date),10), and only the date for the latest comment made by this user – MAX(d.comment_date).
CID – Comment ID of the latest comment.
PID + Comment – Corresponding post ID (or article ID) and first 5 characters of the comment. Why only 5? To prevent wrapping around.
CNT – Totall count, how many comments the user actually posted.
Now it is time for useless users, namely, users who did not changes their default password, and did not post any comments, and did not participated in any forums:
=09= Useless users
SELECT a.ID, a.user_login, a.user_email, a.display_name, SUBSTR(a.user_registered, 1, 10) AS REGISTERED, SUBSTRING_INDEX(SUBSTRING_INDEX(b.meta_value, '"', 2), '"', -1) AS ROLE FROM wp_users a LEFT JOIN wp_usermeta b on a.ID = b.user_id and b.meta_key = 'wp_capabilities' LEFT JOIN wp_posts c on a.ID = c.post_author LEFT JOIN wp_comments d on a.ID = d.user_id WHERE a.ID IN (select f.user_id from wp_usermeta f where f.meta_key = 'default_password_nag' AND f.meta_value = '1') AND c.post_author IS NULL AND d.user_id IS NULL GROUP BY a.ID, a.user_login, a.display_name, a.user_registered, b.meta_value ORDER BY a.user_registered INTO OUTFILE '/_BACKUP/MySQL.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
This time we outputting too wide of a string, and we need to output it into a CSV file to be able to read in Excel. That is why you see a new option – INTO OUTFILE.
If we narrow down the output to just 3 columns – UID, login name and email, and dare registered, we will receive output similar to this:
SELECT a.ID, a.user_login, LEFT(a.user_email,20), SUBSTR(a.user_registered, 1, 10) AS REGISTERED FROM wp_users a LEFT JOIN wp_usermeta b on a.ID = b.user_id and b.meta_key = 'wp_capabilities' LEFT JOIN wp_posts c on a.ID = c.post_author LEFT JOIN wp_comments d on a.ID = d.user_id WHERE a.ID IN (select f.user_id from wp_usermeta f where f.meta_key = 'default_password_nag' AND f.meta_value = '1') AND c.post_author IS NULL AND d.user_id IS NULL; ID | user_login | LEFT(a.user_email,20) | REGISTERED -----+-----------------+-----------------------+----------- NN | NabincCoinioria | llllll@rrrrrr.ru | 20YY-MM-DD NNN | DougThaler | dddd@ffffffff.com | 20YY-MM-DD NNN | paul253 | sssss@mmmmmmmmmmm.net | 20YY-MM-DD NNN | SteveSabo | ssssss@llllllllllllll | 20YY-MM-DD NNNN | sebrantigan | ssssssssss@sssssssss | 20YY-MM-DD
=10= Delete useless users
Using user ID from the output you can easily delete all these useless users on two steps. First, delete all related records from wp_usermeta, and then delete all related records from wp_users. Say, you want to delete all users with ID higher than 84317. Issue these two simple statements.
First, let’s check, how many records we are about to delete:
SELECT count(*) FROM wp_usermeta WHERE user_id >= 84317; SELECT count(*) FROM wp_users WHERE ID >= 84317;
Next, let’s just delete all these useless users:
DELETE FROM wp_usermeta WHERE user_id >= 84317; DELETE FROM wp_users WHERE ID >= 84317;
Hurray. Instead of spending hours deleting 20 users at a time, we just deleted thousands of robot-created users in a matter of minutes. I hope some will come up with a more convenient way to delete all this garbage.
Finally, let us combine all our knowledge from previous sections, plus some more refinements, and we are ready for this super inclusive SQL statement:
=11= Active users and their activity
SELECT * FROM (SELECT "01 User ID", "02 User Login", "03 User Email", "04 Display Name", "05 Date Registered", "06 Role", "07 Articles Count", "08 Last Forum Post ID + Date + Title", "09 Forum Post Count", "10 Last Comment ID + Date + Comment", "11 Comment Count" ) AS U1 UNION ALL SELECT * FROM (SELECT a.ID, a.user_login, a.user_email, a.display_name, SUBSTR(a.user_registered, 1, 10) AS REGISTERED, SUBSTRING_INDEX(SUBSTRING_INDEX(b.meta_value, '"', 2), '"', -1) AS ROLE, COUNT(DISTINCT case when c.post_type = 'post' then c.ID else null end) AS "Articles Count", (select CONCAT(g.ID, ' - ', left(g.post_date,10), ' - ', g.post_title) from wp_posts g where g.ID = MAX(c.ID) and g.post_type in ('reply', 'topic')) AS "Forum Post ID - Date - Title", COUNT(DISTINCT case when c.post_type in ('reply', 'topic') then c.ID else null end) AS "Forum Posts Count", (select CONCAT (e.comment_post_ID, ' - ', left(e.comment_date,10), ' - ', SUBSTR( TRIM( e.comment_content ), 1, 30)) from wp_comments e where e.comment_id = MAX(d.comment_id)) AS "Comment ID + Date + Comment", COUNT(DISTINCT d.comment_id) AS "Comment Count" FROM (select * from wp_users order by user_login) a LEFT JOIN wp_usermeta b on a.ID = b.user_id and b.meta_key = 'wp_capabilities' LEFT JOIN wp_posts c on a.ID = c.post_author LEFT JOIN wp_comments d on a.ID = d.user_id WHERE (a.ID NOT IN (select f.user_id from wp_usermeta f where f.meta_key = 'default_password_nag' AND f.meta_value = '1') OR c.post_type is not null OR d.comment_id is not null OR SUBSTRING_INDEX(SUBSTRING_INDEX(b.meta_value, '"', 2), '"', -1) != 'subscriber') AND (c.post_type in ('reply', 'topic', 'post') or c.post_type is null) GROUP BY a.ID, a.user_login, a.display_name, a.user_registered, b.meta_value ORDER BY a.user_login ) AS U2 INTO OUTFILE '/_BACKUP/MySQL.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
The result of this super statement is a CSV file with 11 columns. Open this file in Excel and examine list of your active users. Also, add all kinds of analytical data to columns 12 and beyond.
Among other things this SQL statement shows how to do following:
– Using union all and order by together
– Using conditional distinct count – COUNT(DISTINCT case when … )
– Using SUBSTRING_INDEX twice to extract substring between two characters from a longer string
– Using LEFT JOIN and comparing value to null
– Output SQL results into CSV file
– Adding column headers to CSV output (using union all)
I would love to hear, if anyone can share a way to make this SQL statement into a PHP page that can be dynamically displayed in your admin panel. This, and useless users list, of course.
Good luck!
Be the first to comment