GPS Humano: 2009

quarta-feira, 30 de dezembro de 2009

NRPE for Endian Firewall

Finally I had the time to compile NRPE (the nagios addon) for Endian Firewall. If you are in a hurry, look for the download section below.

First of all, an important finding is that Endian Firewall (EFW) seems to be based on Fedora Core 3, so maybe sometimes you can spare some work by installing FC3 RPMs directly. And that's what we'll do right away, so we can move around EFW more easily.

Packaging and installing NRPE

Install wget:
rpm -Uvh http://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/i386/os/Fedora/RPMS/info-4.7-5.i386.rpm rpm -Uvh http://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/i386/os/Fedora/RPMS/wget-1.9.1-17.i386.rpm

If you don't have an Endian Community Account, register yourself at: http://www.endian.org/register

Then you can run, from the shell:
efw-upgrade
This command will ask you the profile your dummy endian should assume and will install the necessary build tools if you choose development.

Grab the source for NRPE from: http://www.nagios.org/download/addons/ (it was version 2.12 in my case):
cd /root wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz

Now place the files for RPM building. Refer to rpmbuild(8) man page if you are not familiar with the requirements.
cd /usr/src/endian/SOURCES cp /root/nrpe-2.12.tar.gz . tar xfvz /root/nrpe-2.12.tar.gz cp nrpe-2.12/nrpe.spec ../SPECS

For NRPE to work seemlessly within EFW some minor adjustments have to be made. I'm providing them here, with a brief explanation: nrpe-2.12-init-script.patch will fix some assumptions made for general RPMs and nrpe-2.12-spec.patch will remove unused dependencies and provide an EFW startup script. Apply these patches into the distribution you unpacked in the previous step - refer to patch(1) man page.

After applying the patches to the uncompressed distribution, you'll need to repackage the tarball - it will be automatically decompressed for the building stage.

Now build everything:
rpmbuild -ba /usr/src/endian/SPECS/nrpe.spec
This should compile you the RPM for you current architecture. It will integrate in for shell administration only, which means you won't be able to control it from the web interface.

Finally, install the package:
rpm -ivh /usr/src/endian/RPMS/`uname -i`/nrpe-2.12-1.`uname -i`.rpm
And that should be it, for NRPE. Go on and setup /etc/nagios/nrpe.cfg and don't forget to add the firewall rules for the port you selected (default is 5666).

Packaging and installing nagios-plugins

Grab the source at: http://sourceforge.net/projects/nagiosplug/files/

cd /root wget http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.14/nagios-plugins-1.4.14.tar.gz/download

Repeat the procedure you did for NRPE: place the tarball on SOURCES and the spec file on SPECS:
cp nagios-plugins-1.4.14.tar.gz /usr/src/endian/SOURCES/ cd /usr/src/endian/SOURCES/ tar xfvz nagios-plugins-1.4.14.tar.gz chown -R root:root nagios-plugins-1.4.14 cp nagios-plugins-1.4.14/nagios-plugins.spec ../SPECS/

This bundle of plugins includes the so-called standard plugins for nagios. They are a lot and you maybe can just cut some off so the building is quicker. Also, you may avoid depend on perl(Net::SNMP), perl(Crypt::DES) and perl(Socket6) - which you can grab from DAG's RPM repo (remember the FC3 branching).


cd /root
wget http://dag.wieers.com/rpm/packages/perl-Net-SNMP/perl-Net-SNMP-5.2.0-1.1.fc3.rf.noarch.rpm
wget http://dag.wieers.com/rpm/packages/perl-Crypt-DES/perl-Crypt-DES-2.05-3.1.fc3.rf.i386.rpm
wget http://dag.wieers.com/rpm/packages/perl-Socket6/perl-Socket6-0.19-1.1.fc3.rf.i386.rpm

Finally, install everything:
rpm -ivh perl-Net-SNMP-5.2.0-1.1.fc3.rf.noarch.rpm \ perl-Crypt-DES-2.05-3.1.fc3.rf.i386.rpm \ perl-Socket6-0.19-1.1.fc3.rf.i386.rpm \ /usr/src/endian/RPMS/i386/nagios-plugins-1.4.14-1.i386.rpm

Final notes

Be aware that this is a sample demonstration. I was more interested in having it done for my case - since I can fix future problems - rather than doing a proper/full EFW integration. If you think you can contribute with tweaking this build process just drop me a note.

Download

Here are the RPMs which include the supra-mentioned tweaks (this required extra patching on the .spec file and include the patch within the source):

nrpe-2.12-1.i386.rpm (src.rpm)

nagios-plugins-1.4.14-1.i386.rpm (src.rpm)

terça-feira, 29 de dezembro de 2009

As prendas da MySQL para 2009

Bem, já lá vai algum tempito, mas aproveitei o tempo de férias para dar algum feedback do que de novo apareceu relativamente ao MySQL.

Começo por sinalizar o aparente descontinuamento do ramo 5.0.x do MySQL, pelo menos no que toca à manutenção activa. Também já não era sem tempo, já que a versão 5.1 trouxe demasiadas coisas boas para ser ignorada.

A seguir deve seguir-se a 5.4.x; as novidades foram muito bem recebidas, sobretudo pela malta do Solaris e do InnoDB que, por sua vez - e devido ao facto do InnoDB passar a ser desenvolvido em forma de plugin - deverá evoluir nestas questões de forma independente do core. Mas a grande grande novidade...

... é o surgimento da versão 5.5 como milestone 2 a caminho do próximo ramo GA que, além de fundir as novidades da 5.4, oferece-nos (finalmente!):

a possibilidade de alargar o particionamento para além das funções de particionamento, que eram pouquíssimas e até para particionar por data eram necessários alguns estratagemas;

o suporte para key caches por partição (fantástico!);

as tão esperadas funções SIGNAL e RESIGNAL - as marteladas que eram necessárias para contornar esta lacuna eram terríveis de manter :);

o suporte para a replicação semi-síncrona sob a forma de plugin;

e outras, que não obstante não serem enumeradas por mim, não devem ser descuradas na leitura!!

Claro que já houve experimentadores destas tecnologias, e aqui vos deixo mais material de leitura:

A análise detalhada sobre as melhorias do particionamento, por Giuseppe Maxia (que esteve há dias no SAPO Codebits);

Um Quick Start sobre a replicação semi-síncrona, também do Giuseppe Maxia;
Um in-depth sobre a utilização do SIGNAL/RESIGNAL, por Roland Bouman;

Convido-vos a deixarem aqui links para os vossos artigos e comentários sobre as restantes funcionalidades.

sábado, 17 de outubro de 2009

Actualização das páginas órfãs

A pedido do Lijealso, aqui vai uma actualização das estatísticas incompletas da Wikipédia lusófona para o caso das páginas órfãs.

Constatou-se que o dump utilizado anteriormente era insuficiente, pelo que se descarregou a tabela pagelinks, desta vez do dump de 20091015. Para se excluir os redireccionamentos, importou-se também a tabela redirect.

Fartei-me entretanto de alternar entre o que estava a fazer e a lista de códigos de domínios, pelo que criei uma pequena tabela auxiliar:

[mysql]
mysql> create table _namespaces ( id tinyint not null, namespace varchar(50), primary key (id) );
Query OK, 0 rows affected (0.01 sec)

mysql> insert into _namespaces values (-2,'Media'),(-1,'Especial'),(0,''),(1,'Discussão'),(2,'Usuário'),(3,'Usuário Discussão'),(4,'Wikipedia'),(5,'Wikipedia Discussão'),(6,'Ficheiro'),(7,'Ficheiro Discussão'),(8,'MediaWiki'),(9,'MediaWiki Discussão'),(10,'Predefinição'),(11,'Predefinição Discussão'),(12,'Ajuda'),(13,'Ajuda Discussão'),(14,'Categoria'),(15,'Categoria Discussão'),(100,'Portal'),(101,'Portal Discussão'),(102,'Anexo'),(103,'Anexo Discussão');
Query OK, 22 rows affected (0.00 sec)
Records: 22 Duplicates: 0 Warnings: 0
[/mysql]

O resultado deu-me um incrível total de 769854 páginas órfãs, pelo que decidi separá-las por namespace para permitir prioritizar a análise:

[mysql]
mysql> select p.page_namespace,count(1) from page p
-> LEFT JOIN redirect r
-> ON p.page_id = r.rd_from
->
-> LEFT JOIN pagelinks pl
-> on pl.pl_namespace = p.page_namespace
-> and pl.pl_title = p.page_title
->
-> WHERE r.rd_from IS NULL
-> AND pl.pl_from IS NULL
->
-> group by p.page_namespace;
+----------------+----------+
| page_namespace | count(1) |
+----------------+----------+
| 0 | 12958 |
| 1 | 103645 |
| 2 | 16592 |
| 3 | 568675 |
| 4 | 1954 |
| 5 | 856 |
| 8 | 773 |
| 9 | 17 |
| 10 | 7522 |
| 11 | 1014 |
| 12 | 3 |
| 13 | 27 |
| 14 | 51735 |
| 15 | 1315 |
| 100 | 1190 |
| 101 | 117 |
| 102 | 173 |
| 103 | 1288 |
+----------------+----------+
18 rows in set (20.90 sec)
[/mysql]

O resultado do cruzamento das duas tabelas foi afixado aqui, com uma listagem de 15M para os 12958 artigos no domínio principal. Na verdade, esta listagem foi feita para colar numa página wiki, no entanto tenham em atenção que são 15M, pelo que não recomendo fazê-lo. Têm outras listas (como a mais simples, em formato pageid,namespace,title) nessa directoria.

segunda-feira, 5 de outubro de 2009

Revisita aos dumps da Wikipédia

Desta vez em português, decidi dar [alguma] continuidade ao que comecei há uns dias com a importação dos dumps da Wikipédia. Graças à dica do Rei-artur foi fácil extrair a lista de robôs, para excluir das estatísticas.

[code]
[myself@speedy ~]# wget 'http://pt.wikipedia.org/w/api.php?action=query&list=allusers&aufrom=A&augroup=bot&aulimit=500&format=txt' -q -O - > bots.tmp

[myself@speedy ~]# cat bots.tmp | grep '\[name\]' | sed 's,^.*\[name\] => ,,' > /tmp/bots.txt
[/code]
Aproveitei e repesquei os user_id para simplificar as pesquisas sem fazer alterações na tabela user.
[mysql]
mysql> create table user_bots ( bot_name varchar(25) );
Query OK, 0 rows affected (0.01 sec)

mysql> load data infile '/tmp/bots.txt' into table user_bots;
Query OK, 136 rows affected (0.00 sec)
Records: 136 Deleted: 0 Skipped: 0 Warnings: 0

mysql> alter table user_bots add column bot_user_id int;
Query OK, 136 rows affected (0.01 sec)
Records: 136 Duplicates: 0 Warnings: 0

mysql> alter table user add index idx_t ( user_name );
Query OK, 119134 rows affected (2.63 sec)
Records: 119134 Duplicates: 0 Warnings: 0

mysql> update user_bots ub join user u on user_name = bot_name set ub.bot_user_id = u.user_id;
Query OK, 134 rows affected (0.00 sec)
Rows matched: 134 Changed: 134 Warnings: 0

mysql> alter table user_bots add primary key (bot_user_id);
Query OK, 136 rows affected, 1 warning (0.00 sec)
Records: 136 Duplicates: 0 Warnings: 1

mysql> show warnings;
+---------+------+---------------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------------+
| Warning | 1265 | Data truncated for column 'bot_user_id' at row 71 |
+---------+------+---------------------------------------------------+
1 row in set (0.00 sec)

mysql> update user_bots set bot_user_id = -1 where bot_user_id = 0;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
[/mysql]

Não tinha reparado que havia um utilizador/robô com o nome "MediaWiki default" mas, bem, depois de criar a Primary Key ficou com o bot_user_id=0 e, para evitar que coincidisse com o agregado para anonymous, dei-lhe o bot_user_id=-1.

Então agora já estamos prontos a completar a query onde ficámos no último dia (número de edições em artigos distintos em cada namespace por utilizador):
[mysql]
mysql> explain SELECT epn.user_name,epn.page_namespace,epn.edits
-> FROM edits_per_namespace epn
-> LEFT JOIN user_bots ub ON epn.user_id = ub.bot_user_id
-> WHERE ub.bot_user_id IS NULL
-> AND epn.user_id <> 0
-> ORDER BY edits desc limit 20;
+----+-------------+-------+--------+---------------+---------+---------+----------------------+--------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+--------+--------------------------------------+
| 1 | SIMPLE | epn | ALL | NULL | NULL | NULL | NULL | 187624 | Using where; Using filesort |
| 1 | SIMPLE | ub | eq_ref | PRIMARY | PRIMARY | 4 | ntavares.epn.user_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+--------+--------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT epn.user_name,epn.page_namespace,epn.edits
-> FROM edits_per_namespace epn
-> LEFT JOIN user_bots ub ON epn.user_id = ub.bot_user_id
-> WHERE ub.bot_user_id IS NULL
-> AND epn.user_id <> 0
-> ORDER BY edits desc limit 10;
+----------------+----------------+-------+
| user_name | page_namespace | edits |
+----------------+----------------+-------+
| EMP,Nice poa | 0 | 58138 |
| Dantadd | 0 | 44767 |
| João Carvalho | 3 | 44533 |
| OS2Warp | 0 | 43396 |
| Yanguas,Sonlui | 0 | 37020 |
| Lijealso | 0 | 34157 |
| Rei-artur | 0 | 33863 |
| Tumnus | 3 | 33213 |
| Nuno Tavares | 0 | 31910 |
| Bisbis | 0 | 29886 |
+----------------+----------------+-------+
10 rows in set (0.76 sec)
[/mysql]
Os resultados completos estão aqui.

Já agora, para finalizar, a tão afamada lista de wikipedistas por número de edições:

[mysql]
mysql> create table edits_per_user select rev_user,count(1) as counter from revision group by rev_user;
Query OK, 119134 rows affected (12.61 sec)
Records: 119134 Duplicates: 0 Warnings: 0

mysql> select u.user_name,epu.counter
-> from edits_per_user epu
-> left join user_bots ub on ub.bot_user_id = epu.rev_user
-> join user u on epu.rev_user = u.user_id
-> where ub.bot_user_id IS NULL order by counter desc limit 10;
+----------------+---------+
| user_name | counter |
+----------------+---------+
| anonymous | 3119758 |
| EMP,Nice poa | 176338 |
| OS2Warp | 163751 |
| Dantadd | 105657 |
| Lijealso | 90025 |
| Yanguas,Sonlui | 89152 |
| Rei-artur | 83662 |
| Mschlindwein | 75680 |
| Bisbis | 75361 |
| Nuno Tavares | 73141 |
+----------------+---------+
10 rows in set (0.05 sec)
[/mysql]
Os resultados completos estão aqui.

sábado, 3 de outubro de 2009

Revisita aos dados estruturados

Há alguns dias num mergulho profundo sobre a utilização de wikis em campos específicos deparei-me com uma "foto" da Wikipédia muito interessante, aqui, que ilustra, entre outras coisas, a actividade na Wikipédia, a vários níveis: Visualizing Science & Tech Activity in Wikipedia:

Fonte: A Beatiful WWW

O website, A Beatiful WWW, dedica-se à extracção e representação dos volumes de informação distintos que conhecemos hoje. Eu já tinha falado nisto e descobri, entretanto, que o Google disponibiliza uma API de representação de dados estruturados.

Consigo pensar numa série de brincadeiras para isto :) Imaginem, por exemplo, juntar isto tudo, logo agora que a Wikimedia vai estar empenhada em manter os conteúdos disponíveis no Amazon Public Data Sets!..

Olhem aqui um exemplo do que pode ser feito, desta vez com Hadoop e Hive: Hot Wikipedia Topics, Served Fresh Daily.

segunda-feira, 28 de setembro de 2009

Finding for each time interval, how many records are "ocurring" during that interval

This is a complex problem: You are mapping events (of some kind) with a start and end timestamp, but how do you know, for a specific interval [ti,tf] (timeslot), how many records of those have start<ti and end>tf? This problem is complex because you have no records defining the timeslot to serve either as a grouping field or comparison field. This is a problem I've seen people tend to approach with a procedural approach, and that's the big problem to understand SQL, which tipically are set problems.

The main issue around this problem is that you need to count existences for a list you don't have. In my real scenario, there are some restrictions to have in mind:

The data set is extremely large, so this operation is daily generated for a specific day.

Due to the above, the table is partitioned on a filtering field (stoptime below).

Immediatelly, some solutions pop in my head:

Use a summary table for each time slot: when a record is inserted, increment all respective time slots by one. This is cool, but I'd like to avoid insert delay. This solution also implies having a persistent table for each timeslot during the whole times of the whole records, right? That could be from 2009-08 to 2009-09, but also could start on 1989-09 to 2009-09, which represent ~10.5M records, some of them possibly zero.

Another option could be to use cursors to iterate through the selection of records which cross a specific minute and perhaps fill a temporary table with the results. Cursors are slow, it is a procedural approach, and represents programming overhead.

But then again, these are both procedural solutions and that's why they don't seem so effective - actually, the first is not quite the same as the second and is pretty (well) used, however it induces some extra effort and schema changes.
The solution I'm proposing is a set theory approach: IF we had a table of timeslots (minute slots), we could just join the two tables and apply the rules we want. But we don't have. But perhaps we can generate it. This idea came out after reading the brilliant examples of Roland Bouman's MySQL: Another Ranking trick and Shlomi Noach's SQL: Ranking without self join.

Let's build an example table:
[mysql]
mysql> CREATE TABLE `phone_calls` (
-> `starttime` datetime NOT NULL,
-> `stoptime` datetime NOT NULL,
-> `id` int(11) NOT NULL,
-> PRIMARY KEY (`id`),
-> KEY `idx_stoptime` (`stoptime`)
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.04 sec)
[/mysql]
Now manually insert some interesting records:
[mysql]
mysql> select * from phone_calls;
+---------------------+---------------------+----+
| starttime | stoptime | id |
+---------------------+---------------------+----+
| 2009-08-03 09:23:42 | 2009-08-03 09:24:54 | 0 |
| 2009-08-03 11:32:11 | 2009-08-03 11:34:55 | 2 |
| 2009-08-03 10:23:12 | 2009-08-03 10:23:13 | 1 |
| 2009-08-03 16:12:53 | 2009-08-03 16:20:21 | 3 |
| 2009-08-03 11:29:09 | 2009-08-03 11:34:51 | 4 |
+---------------------+---------------------+----+
5 rows in set (0.00 sec)
[/mysql]

As an example, you may verify that record id=2 crosses only time slot '2009-08-03 11:33:00' and no other, and record id=0 crosses none. These are perfectly legitimate call start and end timestamps.

Let's see a couple of premisses:

A record that traverses a single minute can be described by this:

MINUTESLOT(starttime) - MINUTESLOT(stoptime) >= 2

You can think of MINUTESLOT(x) as the timeslot record associated with field x in the record. It actually represents CONCAT(LEFT(x,16),":00") and the difference is actually a TIMEDIFF();

A JOIN will give you a product of records for each match, which means if I could "know" a specific timeslot I could multiply it by the number of records that cross it and then GROUP BY with a COUNT(1). But I don't have the timeslots...

As I've said, I'm generating this recordset for a specific day, and that's why these records all refer to 2009-08-03. Let's confirm I can select the recordspace I'm interested in:
[mysql]
mysql> SELECT starttime,stoptime
-> FROM phone_calls
-> WHERE
-> /* partition pruning */
-> stoptime >= '2009-08-03 00:00:00'
-> AND stoptime <= DATE_ADD('2009-08-03 23:59:59', INTERVAL 1 HOUR)
->
-> /* the real filtering:
/*> FIRST: only consider call where start+stop boundaries are out of the
/*> minute slot being analysed (seed.timeslot)
/*> */
-> AND TIMESTAMPDIFF(MINUTE, CONCAT(LEFT(starttime,16),":00"), CONCAT(LEFT(stoptime,16),":00")) >= 2
->
-> /* consequence of the broader interval that we had set to cope
/*> with calls taking place beyond midnight
/*> */
-> AND starttime <= '2009-08-03 23:59:59';
+---------------------+---------------------+
| starttime | stoptime |
+---------------------+---------------------+
| 2009-08-03 11:32:11 | 2009-08-03 11:34:55 |
| 2009-08-03 16:12:53 | 2009-08-03 16:20:21 |
| 2009-08-03 11:29:09 | 2009-08-03 11:34:51 |
+---------------------+---------------------+
3 rows in set (0.00 sec)
[/mysql]

These are the 'calls' that cross any minute in the selected day. I deliberately showed specific restrictions so you understand the many aspects involved:

Partition pruning is fundamental, unless you want to scan the whole 500GB table. This means you are forced to limit the scope of analysed records. Now, if you have a call starting at 23:58:00 and stopping at 00:01:02 the next day, pruning would leave that record out. So I've given 1 HOUR of margin to catch those records;

We had to set stoptime later than the end of the day being analysed. That also means we might catch unwanted records starting between 00:00:00 and that 1 HOUR margin, so we'll need to filter them out;
Finally, there's also our rule about "crossing a minute".

In the end, maybe some of these restrictions (WHERE clauses) can be removed as redundant.

Now let's see if we can generate a table of timeslots:
[mysql]
mysql> select CONVERT(@a,DATETIME) AS timeslot
-> FROM phone_calls_helper, (
-> select @a := DATE_SUB('2009-08-03 00:00:00', INTERVAL 1 MINUTE)) as init
-> WHERE (@a := DATE_ADD(@a, INTERVAL 1 MINUTE)) <= '2009-08-03 23:59:59'
-> LIMIT 1440;
+---------------------+
| timeslot |
+---------------------+
| 2009-08-03 00:00:00 |
| 2009-08-03 00:01:00 |
....
| 2009-08-03 23:58:00 |
| 2009-08-03 23:59:00 |
+---------------------+
1440 rows in set (0.01 sec)
[/mysql]

This is the exciting part: We generate the timeslots using user variables and this might be only possible to do in MySQL. Notice that I need to recur to a table, since I can't produce results from the void: its records are actually used as a product of my join to generate what I want. You can use any table, as long as it has at least 1440 records (number of minutes in a day). But your should also have in mind the kind of access is being made to that table because it can translate to unnecessary I/O if you're not carefull:
[mysql]
mysql> explain select CONVERT(@a,DATETIME) AS timeslot
-> FROM phone_calls_helper, (
-> select @a := DATE_SUB('2009-08-03 00:00:00', INTERVAL 1 MINUTE)) as init
-> WHERE (@a := DATE_ADD(@a, INTERVAL 1 MINUTE)) <= '2009-08-03 23:59:59'
-> LIMIT 1440;
+----+-------------+--------------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | phone_calls_helper | index | NULL | PRIMARY | 4 | NULL | 1440 | Using where; Using index |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+--------------------+--------+---------------+---------+---------+------+------+--------------------------+
3 rows in set (0.00 sec)
[/mysql]
In my case I see scanning the 1400 records is being made on the PRIMARY key, which is great. You should choose a table whose keycache has high probability to be in RAM so the index scanning don't go I/O bound either. Scanning 1440 PRIMARY KEY entries shouldn't be quite an I/O effort even in cold datasets, but if you can avoid it anyway, the better.

At this moment you are probably starting to see the solution: either way the Optimizer choosed the first or the last table, it's always a win-win case, since the 1440 are RAM based: you can choose to think of 1440 timeslots being generated and then multiplied by the number of records that cross each timeslot (Rc), or you can choose to think of the 3 records that cross any timeslot and generate timeslots that fall between the start/stop boundaries of each record (Tr). The mathematical result is

vs.

Well, they might not represent the same effort. Remember that the timeslots are memory and seeking back and forth from it is less costly than seeking back and forth from possibly I/O bound data. However, due to our "imaginary" way of generating the timeslots (which aren't made persistent anyhow by that subquery), we'd need to materialize it so that we could seek on it. But that would also give us the change to optimize some other issues, like CONVERT(), the DATE_ADD()s, etc, and scan only the timeslots that are crossed by a specific call, which is optimal. However, if you're going to GROUP BY the timeslot you could use an index on the timeslot table and fetch each record that cross each timeslot. Tough decision, eh? I have both solutions, I won't benchmark them here, but since the "timeslots per record" made me materialize the table, I'll leave it here as an example:

[mysql]
mysql> CREATE TEMPORARY TABLE `phone_calls_helper2` (
-> `tslot` datetime NOT NULL,
-> PRIMARY KEY (`tslot`)
-> ) ENGINE=MEMORY DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into phone_calls_helper2 select CONVERT(@a,DATETIME) AS timeslot
-> FROM phone_calls_helper, (
-> select @a := DATE_SUB('2009-08-03 00:00:00', INTERVAL 1 MINUTE)) as init
-> WHERE (@a := DATE_ADD(@a, INTERVAL 1 MINUTE)) <= '2009-08-03 23:59:59'
-> LIMIT 1440;
Query OK, 1440 rows affected (0.01 sec)
Records: 1440 Duplicates: 0 Warnings: 0
[/mysql]

So now, the "timeslots per records" query should look like this:
[mysql]
mysql> explain SELECT tslot
-> FROM phone_calls FORCE INDEX(idx_stoptime)
-> JOIN phone_calls_helper2 FORCE INDEX (PRIMARY) ON
-> tslot > CONCAT(LEFT(starttime,16),":00")
-> AND tslot < CONCAT(LEFT(stoptime,16),":00")
->
-> WHERE
-> /* partition pruning */
-> stoptime >= '2009-08-03 00:00:00'
-> AND stoptime <= DATE_ADD('2009-08-03 23:59:59', INTERVAL 1 HOUR)
->
-> /* the real filtering:
/*> FIRST: only consider call where start+stop boundaries are out of the
/*> minute slot being analysed (seed.timeslot)
/*> */
-> AND TIMESTAMPDIFF(MINUTE, CONCAT(LEFT(starttime,16),":00"), CONCAT(LEFT(stoptime,16),":00")) >= 2
->
-> /* consequence of the broader interval that we had set to cope
/*> with calls taking place beyond midnight
/*> */
-> AND starttime <= '2009-08-03 23:59:59'
-> GROUP BY tslot;
+----+-------------+---------------------+-------+---------------+--------------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+---------------+--------------+---------+------+------+------------------------------------------------+
| 1 | SIMPLE | phone_calls | range | idx_stoptime | idx_stoptime | 8 | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | phone_calls_helper2 | ALL | PRIMARY | NULL | NULL | NULL | 1440 | Range checked for each record (index map: 0x1) |
+----+-------------+---------------------+-------+---------------+--------------+---------+------+------+------------------------------------------------+
2 rows in set (0.00 sec)
[/mysql]

It's interesting to see «Range checked for each record (index map: 0x1)» for which the manual states:

MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known.

I can't explain why shouldn't it use the PRIMARY KEY - I tried using CONVERT() for the CONCAT()s to ensure the same data type, but no luck - , but I'm probably safe as it'll probably use it. And this is the final result:
[mysql]
mysql> SELECT tslot,count(1) FROM phone_calls FORCE INDEX(idx_stoptime) JOIN phone_calls_helper2 FORCE INDEX (PRIMARY) ON tslot > CONVERT(CONCAT(LEFT(starttime,16),":00"),DATETIME) AND tslot < CONVERT(CONCAT(LEFT(stoptime,16),":00"),DATETIME) WHERE stoptime >= '2009-08-03 00:00:00' AND stoptime <= DATE_ADD('2009-08-03 23:59:59', INTERVAL 1 HOUR) AND TIMESTAMPDIFF(MINUTE, CONCAT(LEFT(starttime,16),":00"), CONCAT(LEFT(stoptime,16),":00")) >= 2 AND starttime <= '2009-08-03 23:59:59' GROUP BY tslot;
+---------------------+----------+
| tslot | count(1) |
+---------------------+----------+
| 2009-08-03 11:30:00 | 1 |
| 2009-08-03 11:31:00 | 1 |
| 2009-08-03 11:32:00 | 1 |
| 2009-08-03 11:33:00 | 2 |
| 2009-08-03 16:13:00 | 1 |
| 2009-08-03 16:14:00 | 1 |
| 2009-08-03 16:15:00 | 1 |
| 2009-08-03 16:16:00 | 1 |
| 2009-08-03 16:17:00 | 1 |
| 2009-08-03 16:18:00 | 1 |
| 2009-08-03 16:19:00 | 1 |
+---------------------+----------+
11 rows in set (0.00 sec)
[/mysql]

Notice that I already did the GROUP BY and that it forces a temporary table and filesort, so it's better to be careful on how many records this will generate. In my (real) case the grouping is done on more phone_calls fields, so I can probably reuse the index later. As regarding the post-execution, since the helper table is TEMPORARY, everything will be dismissed automatically without further programming overhead.

I hope you understand this solution opens a wide range of "set"-based solutions to problems you might try to solve in a procedural way - which is the reason your solution might turn to be painfull.

domingo, 27 de setembro de 2009

Importing wikimedia dumps

We are trying to gather some particular statistics about portuguese wikipedia usage.
I proposed myself for import the ptwiki-20090926-stub-meta-history dump, which is a XML file, and we'll be running very heavy queries (it's my task to optimize them, somehow).

What I'd like to mention is that the importing mechanism seems to be tremendously simplified. I remember testing a couple of tools in the past, without much success (or robustness). However, I gave a try to mwdumper this time, and it seems it does it. Note however that there were schema changes from the last mwdumper release, so you should have a look at WMF Bug #18328: mwdumper java.lang.IllegalArgumentException: Invalid contributor which releases a proposed fix which seems to work well. Special note to its memory efficiency: RAM is barely touched!

The xml.gz file is ~550MB, and was converted to a ~499MB sql.gz:


1,992,543 pages (3,458.297/sec), 15,713,915 revs (27,273.384/sec)

I've copied the schema from a running (updated!) mediawiki to spare some time. The tables seem to be InnoDB default, so let's simplify I/O a bit (I'm on my laptop). This will also allow to speed up loading times a lot:
[mysql]
mysql> ALTER TABLE `text` ENGINE=Blackhole;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table page drop index page_random, drop index page_len;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table revision drop index rev_timestamp, drop index page_timestamp, drop index user_timestamp, drop index usertext_timestamp;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
[/mysql]

The important here is to avoid the larger I/O if you don't need it at all. Table text has page/revision content which I'm not interested at all. As regarding MySQL's configuration (and as a personal note, anyway), the following configuration will give you great InnoDB speeds:
[code]
key_buffer = 512K
sort_buffer_size = 16K
read_buffer_size = 2M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size = 512K
query_cache_size = 0
query_cache_type = 0
bulk_insert_buffer_size = 2M

innodb_file_per_table
transaction-isolation = READ-COMMITTED
innodb_buffer_pool_size = 2700M
innodb_additional_mem_pool_size = 20M
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 0
innodb_doublewrite = 0
skip-innodb-checksum
innodb_locks_unsafe_for_binlog=1
innodb_log_file_size=128M
innodb_log_buffer_size=8388608
innodb_support_xa=0
innodb_autoextend_increment=16
[/code]

Now I'd recommend uncompress the dump so it's easier to trace the whole process if it's taking too long:
[code]
[myself@speedy ~]$ gunzip ptwiki-20090926-stub-meta-history.sql.gz
[myself@speedy ~]$ cat ptwiki-20090926-stub-meta-history.sql | mysql wmfdumps
[/code]

After some minutes on a Dual Quad Core Xeon 2.0GHz and 2.4 GB of datafiles we are ready to rock! I will probably also need later the user table, which Wikimedia doesn't distribute, so I'll rebuild it now:
[mysql]
mysql> alter table user modify column user_id int(10) unsigned NOT NULL;
Query OK, 0 rows affected (0.12 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table user drop index user_email_token, drop index user_name;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> insert into user(user_id,user_name) select distinct rev_user,rev_user_text from revision where rev_user <> 0;
Query OK, 119140 rows affected, 4 warnings (2 min 4.45 sec)
Records: 119140 Duplicates: 0 Warnings: 0

mysql> alter table user drop primary key;
Query OK, 0 rows affected (0.13 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> insert into user(user_id,user_name) values(0,'anonymous');
Query OK, 1 row affected, 4 warnings (0.00 sec)
[/mysql]
It's preferable to join on INT's rather than VARCHAR(255) that's why I reconstructed the user table. I actually removed the PRIMARY KEY but I set it after the process. What happens is that there are users that have been renamed and thus they appear with same id, different user_name. The query to list them all is this:
[mysql]
mysql> select a.user_id,a.user_name from user a join (select user_id,count(1) as counter from user group by user_id having counter > 1 order by counter desc) as b on a.user_id = b.user_id order by user_id DESC;
....
14 rows in set (0.34 sec)

mysql> update user a join (select user_id,GROUP_CONCAT(user_name) as user_name,count(1) as counter from user group by user_id having counter > 1) as b set a.user_name = b.user_name where a.user_id = b.user_id;
Query OK, 14 rows affected (2.49 sec)
Rows matched: 14 Changed: 14 Warnings: 0
[/mysql]

The duplicates were removed manually (they're just 7). Now, let's start to go deeper. I'm not concerned about optimizing for now. What I wanted to run right away was the query I asked on Toolserver more than a month ago:

[mysql]
mysql> CREATE TABLE `teste` (
-> `rev_user` int(10) unsigned NOT NULL DEFAULT '0',
-> `page_namespace` int(11) NOT NULL,
-> `rev_page` int(10) unsigned NOT NULL,
-> `edits` int(1) unsigned NOT NULL,
-> PRIMARY KEY (`rev_user`,`page_namespace`,`rev_page`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into teste select r.rev_user, p.page_namespace, r.rev_page, count(1) AS edits from revision r JOIN page p ON r.rev_page = p.page_id GROUP BY r.rev_user,p.page_namespace,r.rev_page;
Query OK, 7444039 rows affected (8 min 28.98 sec)
Records: 7444039 Duplicates: 0 Warnings: 0

mysql> create table edits_per_namespace select straight_join u.user_id,u.user_name, page_namespace,count(1) as edits from teste join user u on u.user_id = rev_user group by rev_user,page_namespace;
Query OK, 187624 rows affected (3.65 sec)
Records: 187624 Duplicates: 0 Warnings: 0

mysql> select * from edits_per_namespace order by edits desc limit 5;
+---------+---------------+----------------+--------+
| user_id | user_name | page_namespace | edits |
+---------+---------------+----------------+--------+
| 76240 | Rei-bot | 0 | 365800 |
| 0 | anonymous | 0 | 253238 |
| 76240 | Rei-bot | 3 | 219085 |
| 1740 | LeonardoRob0t | 0 | 145418 |
| 170627 | SieBot | 0 | 121647 |
+---------+---------------+----------------+--------+
5 rows in set (0.09 sec)
[/mysql]

Well, that's funny Rei-artur's bot beats all summed anonymous edits on the main namespace :) I still need to setup a way of discarding the bots, they usually don't count on stats. I'll probably set a flag on the user table myself, but this is enough to get us started..

Listing miscellaneous Apache parameters in SNMP

We recently had to look at a server which occasionaly died with DoS. I was manually monitoring a lot of stuff while I was watching a persistent BIG apache worker popping up occasionally and then disappear (probably being recycled). Yet more rarely I caught two of them. This machine was being flood with blog spam from a botnet. I did the math and soon I found that if the current number of allowed workers was filled the way this was, the machine would start swapping like nuts. This seemed to be the cause.

After corrected the problem (many measures were taken, see below), I searched for cacti templates that could evidence this behaviour. I found that ApacheStats nor the better Apache templates didn't report about Virtual Memory Size (VSZ) nor Resident Set Size (RSS), which is exaplained by mod_status not reporting it either (and they fetch the data by querying mod_status).

So here's a simple way of monitoring these. Suppose there is a server which runs some apache workers you want to monitor, and there is machine to where you want to collect data:

Edit your server's /etc/snmp/snmpd.conf
[code]
# .... other configuration directives
exec .1.3.6.1.4.1.111111.1 ApacheRSS /usr/local/bin/apache-snmp-rss.sh
[/code]

'1.3.6.1.4.1.111111.1' OID is a branch of '.1.3.6.1.4.1' which was assigned with meaning '.iso.org.dod.internet.private.enterprises', which is where one enterprise without IANA assignmed code should place its OIDs. Anyway, you can use any sequence you want.

Create a file named /usr/local/bin/apache-snmp-rss.sh with following contents:
[code]
#!/bin/sh
WORKERS=4
ps h -C httpd -o rss | sort -rn | head -n $WORKERS
[/code]

Notice that httpd is apache's process name in CentOS. In Debian, eg, that would be apache. Now give the script execution rights. Now go to your poller machine, from where you'll do the SNMP queries:
[code]
[root@poller ~]# snmpwalk -v 2c -c public targetserver .1.3.6.1.4.1.111111.1.101
SNMPv2-SMI::enterprises.111111.1.101.1 = STRING: "27856"
SNMPv2-SMI::enterprises.111111.1.101.2 = STRING: "25552"
SNMPv2-SMI::enterprises.111111.1.101.3 = STRING: "24588"
SNMPv2-SMI::enterprises.111111.1.101.4 = STRING: "12040"
[/code]

So this is reporting the 4 most consuming workers (which is the value specified in the script variable WORKERS) with their RSS usage (that's the output of '-o rss' on the script).

Now graphing these values is a bit more complicated, specially because the graphs are usually created on a "fixed number of values" basis. That means whenever your workers number increases or decreases, the script has to cope with it. That's why there is filtering ocurring on the script: first we reverse order them by RSS size, then we get only the first 4 - this means you'll be listing the most consuming workers. To avoid having your graphs asking for more values than the scripts generates, the WORKERS script variable should be adjusted to the minimum apache workers you'll ever have on your system - that should be the httpd.conf StartServers directive.

Now going for the graphs: this is the tricky part as I find cacti a little overcomplicated. However you should be OK with this Netuality post. You should place individual data sources for each of the workers, and group the four in a Graph Template. This is the final result, after lots of struggling trying to get the correct values (I still didn't manage to get the right values, which are ~22KB):

In this graph you won't notice the events I exposed in the beginning because other measures were taken, including dynamic firewalling, apache tuning and auditing the blogs for comment and track/pingback permissions - we had an user wide open to spam, and that was when the automatic process of cleaning up the blog spam was implemented. In any case, this graph will evidence future similar situations which I hope are over.

I'll try to post the cacti templates as well, as soon as I recover from the struggling :) Drop me a note if you're interested.

sexta-feira, 25 de setembro de 2009

Side-effect of mysqlhotcopy and LVM snapshots on active READ server

I just came across a particular feature of MySQL while using inspecting a Query Cache being wiped out at backup times. Whenever you run FLUSH TABLES, the whole Query Cache gets flushed as well, even if you FLUSH TABLES a particular table. And guess what, mysqlhotcopy issues FLUSH TABLES so the tables get in sync on storage.

I actually noticed the problem with Query Cache on a server reporting the cache flush at a [too] round time (backup time).

First thought was «there's something wrong about mysqlhotcopy. But actually this is expected behaviour:

When no tables are named, closes all open tables, forces all tables in use to be closed, and flushes the query cache. With one or more table names, flushes only the given tables. FLUSH TABLES also removes all query results from the query cache, like the RESET QUERY CACHE statement.

I got curious about why the heck closing a table should invalidate the cache - maybe the "close table" mechanism is overly cautious?

Anyway, it's not mysqlhotcopy's fault. And since you should issue FLUSH TABLES for LVM snapshost for consistentency as well, this method is also affected, which renders both methods pretty counter-perfomance in a single production server, comparing to mysqldump, unless you do post-backup warmup process. For that, it would be interesting to be able to dump the QC contents and reload them after the backup - which is not possible, at the moment... bummer...

quarta-feira, 23 de setembro de 2009

Tamanho e character set de campos VARCHAR e consequências

Apesar do tamanho dos campos VARCHAR não influenciar o tamanho dos índices, este tipo de dados (e outros semelhantes) comportam uma série de características que podem afectar seriamente a performance. Vamos testar o impacto destes campos em dois cenários: com character sets diferentes e com tamanhos diferentes.

Tendo em conta que o storage engine MEMORY apenas trabalha com tamanhos de tuplos fixos (fixed-length rows), e que este engine é o utilizado para tabelas temporárias (uma coisa a evitar, embora nem sempre seja possível), as consequências podem ser desastrosas.

Para esta demonstração, vamos definir o tamanho máximo das tabelas MEMORY para um valor que possamos atingir em alguns segundos, o mínimo:

[mysql]
mysql> set max_heap_table_size = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show variables like '%heap%';
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| max_heap_table_size | 16384 |
+---------------------+-------+
1 row in set (0.00 sec)
[/mysql]

Observamos que o mínimo que conseguimos será 16KB. Vamos ver o que acontece com campos VARCHAR (ie, de comprimento [VAR]iável):
[mysql]
mysql> CREATE TABLE `varchar_small` (
-> `id` varchar(36) NOT NULL
-> ) ENGINE=MEMORY DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into varchar_small values('abcd');
Query OK, 1 row affected (0.00 sec)

mysql> -- Vamos enchendo ate nao dar mais....
mysql> insert into varchar_small select * from varchar_small;
ERROR 1114 (HY000): The table 'varchar_small' is full

mysql> CREATE TABLE `var_char` (
-> `id` varchar(36) NOT NULL
-> ) ENGINE=MEMORY DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into var_char values('abcdefgh-ijkl-mnop-qrst-uvwxyzabcedf');
Query OK, 1 row affected (0.00 sec)

mysql> -- O mesmo: vamos enchendo ate nao dar mais...
mysql> insert into var_char select * from var_char;
ERROR 1114 (HY000): The table 'var_char' is full

mysql> select count(1) from var_char;
+----------+
| count(1) |
+----------+
| 320 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from varchar_small;
+----------+
| count(1) |
+----------+
| 320 |
+----------+
1 row in set (0.00 sec)
[/mysql]
Ora bem, o que fiz foi preencher a primeira tabela com conteúdo bem menor que 36 carácteres (apenas 4) e, na segunda, conteúdo que preenchesse o campo todo. O que podemos observar é que, neste storage engine, o espaço ocupado por um campo pouco cheio ou muito cheio é sempre o mesmo: é o tamanho total associado ao campo. É isso que significa fixed-length, e isto acarreta como consequência que o tamanho de um campo VARCHAR em tabelas do tipo MEMORY (leia-se também: tabelas temporárias) é invariavelmente o tamanho máximo do campo. Consequentemente, ainda que um tamanho máximo de 255 não influencie o tamanho de índice sobre esse campo, sempre que seja necessário transportar esses dados para tabelas temporárias pode verificar-se um desperdício enorme de espaço (multiplicar esse desperdício pelo número de linhas que terão que ser transportadas para a tabela temporária!).

Mas isto não fica por aqui: para além de cuidado e bom senso na definição do tamanho do campo, é necessário também ter em atenção o seu encoding. O segundo ponto é sobre a influência do charset. Tipicamente trabalhávamos em Latin1, mas com a disseminação da localização (i18n) passou-se praticamente a usar sempre UTF-8. Tudo muito bem, deixámos de ter uma série de problemas com a formatação das sequências de carácteres (vulgo strings). Mas nem tudo são rosas: o charset UTf-8 pode consumir até 4 carácteres (em MySQL, 3), ao passo que Latin1 usava apenas 1:

[mysql]
mysql> CREATE TABLE `varchar_utf8` (
-> `id` varchar(36) NOT NULL
-> ) ENGINE=MEMORY DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE `varchar_latin1` (
-> `id` varchar(36) NOT NULL
-> ) ENGINE=MEMORY DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.00 sec)
[/mysql]
A cláusula geral de encoding da tabela propaga-se como default para as colunas. De qualquer forma, poderia especificar o encoding para a coluna de forma individual. Para ter a certeza que não estamos a ser enganados pelo encoding corrente no cliente mysql, usei dois scripts em PHP para criar uma única linha para cada uma destas tabelas (o encoding no meu terminal é UTF-8, como já vai sendo comum):
[php]
mysql_connect('localhost','root','');
mysql_select_db('test');
mysql_query("SET NAMES UTF8");
mysql_query("INSERT INTO varchar_utf8 VALUES('ãããããããã-ãããã-ãããã-ãããã-ãããããããããããã')");
[/php]
[php]
mysql_connect('localhost','root','');
mysql_select_db('test');
mysql_query("SET NAMES Latin1");
mysql_query("INSERT INTO varchar_latin1 VALUES('ãããããããã-ãããã-ãããã-ãããã-ãããããããããããã')");
EOF
[/php]
Gravei os dois scripts acima para varchar_utf8.php e varchar_latin1.php.temp, respectivamente. Por fim, alterei o encoding do script para em Latin1 (porque faça o que fizer, o encoding do meu terminal é UTF-8):
[code]
[root@speedy ~]# cat varchar_latin1.php.temp | iconv -f utf8 -t iso-8859-1 > varchar_latin1.php
[/code]
E executei os dois:
[code]
[root@speedy ~]# php -f varchar_utf8.php
[root@speedy ~]# php -f varchar_latin1.php
[/code]
OK, agora tenho 1 registo de cada em cada tabela. Vamos usar esse único registo em cada para encher as tabelas até não ser possível adicionar mais registos:
[mysql]
mysql> -- ...
mysql> insert into varchar_utf8 select * from varchar_utf8;
ERROR 1114 (HY000): The table 'mem_utf8' is full

mysql> -- ...
insert into varchar_latin1 select * from varchar_latin1;
ERROR 1114 (HY000): The table 'mem_latin1' is full

mysql> select count(1) from mem_utf8;
+----------+
| count(1) |
+----------+
| 126 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from mem_latin1;
+----------+
| count(1) |
+----------+
| 320 |
+----------+
1 row in set (0.00 sec)
[/mysql]

Pois é, verificámos o segundo problema: no engine MEMORY couberam muito menos linhas na primeira tabela (UTF-8) que na primeira (Latin1) porque que os campos são do seu tamanho máximo possível e, no caso dos VARCHAR em UTF8, cada carácter pode ocupar até 3 bytes. Ora, para caberem 36 carácteres, serão necessários, pelo menos, 36*3 = 108 bytes! Isto representa um consumo de 300% que pode não ter sido estimado no que respeita ao tamanho de memória permitido para tabelas temporárias.

Ambos cenários são bastante frequentes, especialmente em aplicações que usem frameworks - e também nestes casos, pela generalidade de backends que tentam suportar (MySQL, PostgreSQL, Oracle, etc), as queries envolvidas não costumam ser optimizadas para nenhum motor em particular; no caso do MySQL essas queries geram, muitas vezes, tabelas temporárias sem necessidade e optimizar este tipo de queries pode implicar alterações no núcleo da framework ou, no mínimo, criar excepções à regra.

A escolha do nome da coluna nos exemplos acima não foi ao acaso: com efeito, os campos UUID() costumam ser UTF-8 apenas porque toda a tabela é UTF-8 e, imagine-se, não há nenhum carácter num UUID() que não seja ASCII (na verdade, não há nenhum carácter que não seja [a-zA-Z-1-9]!):
[mysql]
mysql> select UUID();
+--------------------------------------+
| UUID() |
+--------------------------------------+
| 27d6a670-a64b-11de-866d-0017083bf00f |
+--------------------------------------+
1 row in set (0.00 sec)
[/mysql]
Como o conjunto de caracteres de um UUID() é charset compatible entre os vários, esta é uma das optimizações que poderão fazer nas vossas aplicações facilmente, contribuindo para o aumento de performance do MySQL sem qualquer impacto na vossa aplicação.

segunda-feira, 14 de setembro de 2009

Novidades na integração SugarCRM - IPBrick

Para a versão GA do SugarCRM 5.5, estamos a preparar algumas surpresas para a versão 5.1 da IPBrick:

Disponibilização do suporte integrado para qualquer das versões SugarCRM 5.2 e 5.5 (Community Edition, Professional ou Enterprise).

Novo método de sincronização/importação de contas e contactos. Este método reduz exponencialmente a velocidade de sincronização: quanto maiores forem os dados a importar, maior será a diferença na rapidez.

Graças a este novo método vai ser possível também algum nível de sincronização bidireccional. Na verdade trata-se de uma fusão - tanto quanto possível - de dois registos em alterados em ambos lados. O utilizador poderá controlar como pretende os resultados:
- sincronização estrita com o IP Contactos
- sincronização dos comuns (entre IPBrick e SugarCRM) mas preservando os dados isolados do SugarCRM (ie, que não existam no IPBrick), permitindo ao SugarCRM desenvolver autonomia
- ou sincronização de apenas os dados novos do IP Contactos, preservando por completo os registos do SugarCRM.
Melhor integração com o SugarCRM: a nova versão está muito mais robusta no que toca a alterações upgrade safe e a DRI refez a integração para isso mesmo, o que significa que novas versões serão lançadas mais rapidamente.

A existência de uma camada de abstracção, possibilitando testar directa e instantaneamente sobre dados reais dos clientes. Ainda será desenvolvida uma funcionalidade de ofuscamento de dados para permitir manter a confidencialidade desses dados.

A possibilidade de realizar sobre o capítulo de sincronização uma bateria de testes de validação (unit testing) automatizados. Esta medida vai-nos possibilitar fazer controlo de qualidade antes de cada versão do módulo.

E claro, não menos importante, um footprint de memória muito reduzido (< 1MB na linha de comandos);

A nova interface de administração, com um assistente que vai explicando os passos a seguir, oferece agora a possibilidade de extrair relatórios da sincronização (ver abaixo) e permite a revisão do resultado final antes de ser fundido com o SugarCRM:

Quadro inicial:

O primeiro passo é a importação dos dados do IP Contacts e cruzamento com os dados actuais do SugarCRM. No final do processo será possível rever as operações:

Finalmente, o último passo é a fusão:

De resto, estamos ainda a afinar os últimos detalhes do módulo de integração com as comunicações unificadas, que será também adaptado para as alterações da IPBrick 5.2.

sábado, 5 de setembro de 2009

Automatically cleaning up SPAM Wordpress comments

Doing the maintenance of our blogs (Wordpress), I bumped over one that had fallen on an active botnet. It was receiving like 5 or 6 spam comments per minute. It was nearly the only one in such an harassment, so I suspect the botnet loved it for being open on commenting.

Since I've activated reCaptcha I've been monitoring my "spam folder" and I'm really confident on his guesses, so I just wrote a STORED PROCEDURE to clean up these spam comments on a periodic basis, so I can do a sitewide cleanup:

[mysql]
DELIMITER $$

DROP PROCEDURE IF EXISTS `our_blog_db`.`REMOVE_OLD_SPAM`$$
CREATE PROCEDURE `our_blog_db`.`REMOVE_OLD_SPAM` ()
MODIFIES SQL DATA
COMMENT 'remove comentarios marcados como SPAM'
BEGIN

DECLARE done BIT(1) DEFAULT false;
DECLARE commtbl VARCHAR(50);
DECLARE comments_tbls CURSOR FOR SELECT TABLE_NAME
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'our_blog_db' AND TABLE_NAME LIKE '%comments';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = true;

OPEN comments_tbls;

REPEAT
FETCH comments_tbls INTO commtbl;
SET @next_tbl = CONCAT('DELETE FROM our_blog_db.',commtbl,'
WHERE comment_approved = "spam"
AND comment_date_gmt < DATE_SUB(UTC_TIMESTAMP(), INTERVAL 15 DAYS)');
PREPARE get_next_tbl FROM @next_tbl;
EXECUTE get_next_tbl;

UNTIL done END REPEAT;

CLOSE comments_tbls;

END$$

DELIMITER ;
[/mysql]

It's very easy to stick it into an EVENT, if you have MySQL 5.1 or bigger, and which to do a daily clean up automatically:

[mysql]
CREATE EVENT `EV_REMOVE_OLD_SPAM` ON SCHEDULE EVERY 1 DAY STARTS '2009-08-01 21:00:00' ON COMPLETION NOT PRESERVE ENABLE
COMMENT 'remove comentarios marcados como SPAM' DO
BEGIN

SELECT GET_LOCK('remove_spam',5) INTO @remove_spam_lock;

IF @remove_spam_lock THEN
CALL REMOVE_OLD_SPAM();

END IF;

END
[/mysql]

Enjoy!

quarta-feira, 2 de setembro de 2009

About cloud computing

Last Sunday I commented about Pedro's opinion about cloud computing and thought I could give my blog a reversed trackback :) Here it goes:

I think Pedro's message is important. Cloud marketing and fuzzing seems to be targetted to business decision making personnel. However, no matter what they try to look like, that’s a technical decision and I really think that companies just following this marketing hype will eventually get caught on those small contract letters. As a technician, I agree with Pedro on the enterprise [not] moving its core to the cloud, and that the prices are [still] overrated.

However, for medium-to-large traffic platforms, such that they require a complex setup (meaning >4 machines) cloud can be a solution very similar to what could be called Hardware-as-a-Service. Unavoidabily, you have to move this kind of platforms outside the core, even if they are on a DMZ. More, you don’t usually want to mix corporate traffic with specific platforms (eg. a multinational’s CRM, the company’s website, etc.). In this context, cloud adds as much value as a regular hosting company would do, IMO. No more, no less.

Having said that, I still think it has lots of potential for intermediate companies (and again, this lives in technical scope) to provide HW solutions to costumers by clicking and adding “resources” to a [kind of] shop cart and then split them accordingly to their needs. That’s pretty much how Amazon seems to work – not some VPS/sliced hosting we are getting used to. Also, I see benefit for large hosting companies (now these could be those VPS/sliced ones :) ) because they can turn the income on periodic basis to match the periodic costs. From this intermediate’s perspective, one of the great features of this cloud thing is that they have setup quite heterogeneous provising systems, which a regular company can’t handle – that is to say you could setup a small/medium/full-blown pile of servers with a few clicks. Time also costs money.

Of course, this is all theoretical while the prices remain so high. It seemed even worst from my searches (although I confess I didn’t explore in depth): you will pay much more with cloud to have there available the same resources you can find on typical dedicated hosting servers – but it’s also true you rarely use them at 100%, so you may eventually get more cost/performance benefit in the near future (because when you buy or rent hardware it’s very difficult to recover the cost).

My conclusion is that the cloud is trying to attract customers on the hype, and that makes our technical advice more needed than ever: explain to the client how to plan, how to implement, and how to scale and where exactly the cloud fits in. To them, my recommendation is this: being on the cloud just because “it’s cool” or because it (seems) so simple you won’t need specialized IT staff, will eventually turn against you.

segunda-feira, 31 de agosto de 2009

SugarCRM data generator

O SugarCRM dispõe de um gerador de dados para popular a DB com accounts, contacts, etc de exemplo. No entanto, pode ser utilizado para gerar DB's para testes, nomeadamente para testes de carga. Segundo o que percebi, este data generator usa um seed fixo para que os dados gerados para uma DB sejam os mesmos para outra, para que se possam comparar entre si, por exemplo. Eis os passos necessários [usados na versão 5.5]:

Procurar large_scale_test no config.php (próx. à linha 200) e alterar para true:

[code]
- 'large_scale_test' => false,
+ 'large_scale_test' => true,
[/code]

Colocar o script seguinte em install/dataGeneratorKit.php. Este script servirá de wrapper para o populateSeedData.php que já dispõe da lógica para popular a DB em termos propocionais:

[php]
define('sugarEntry',1);
require_once('include/entryPoint.php');
require_once('install/install_utils.php');

require_once('modules/TableDictionary.php');

require_once "include/database/DBManagerFactory.php";
include "install/populateSeedData.php";
?>
[/php]

Talvez seja necessário comentar algumas linhas dos ficheiros i18n, parecidas às que se mostram abaixo (fazem referência a SugarThemeRegistry). Pelo menos em 5.5beta foi:

[php]
...
// 'LBL_EMAIL_ADDRESS_BOOK_TITLE_ICON' => 'getImageURL('icon_email_addressbook.gif').' align=absmiddle border=0> Address Book',
// 'LBL_EMAIL_ADDRESS_BOOK_TITLE_ICON_SHORT' => 'getImageURL('icon_email_addressbook.gif').' align=absmiddle border=0> Addr...',
...
[/php]

Se necessário, alterar os valores no ficheiro install/populateSeedData.php:
[php]
if($large_scale_test) {
// increase the cuttoff time to 1 hour
ini_set("max_execution_time", "3600");
$number_contacts = 100000;
$number_companies = 15000;
$number_leads = 100000;
}
[/php]

Finalmente, executar:
[code]
php -f install/dataGeneratorKit.php
[/code]

Notas

Não sei até que ponto o script é eficiente, pois já me crashou com ''memory exhausted'' nos 256MB de memory_limit.

Aproveitei e copiei este artigo para a Wiki da SugarCRM.

sexta-feira, 28 de agosto de 2009

Multiple field index vs external primary key calculation

I bumped over a client which was using a VARCHAR(200) as a MyISAM PRIMARY KEY key for detecting colisions which was externally (by the app) calculated and fed into MySQL. I supsected the reason for this was to [try to] speed up inserts or being fear that there were colisions and not wanting to have unnecessary data fields. The table was just a bunch of e-mail log rows.

The rows were to be quickly processed (selected between a DATETIME range) and archived. Obvious consequences are: big index size, poor performance. Let's see why:

The index was the concatenation of some fields, and an extra one: "Message-ID" - which could be not unique, AFAIWT. In the end, the key itself was randomly inserted, since it started with the 'sender' address (see below the maillogsmiorig schema). This obvious lead to poor key insertion (random order), of course, but also to duplicate content in the key itself, as the table engine was MyISAM.

More, after the loading, the operations were slower: based on MyISAM. An index would eventually be used, but the data access was unavoidable. That could also be circumvented by InnoDB, since the data is close to the index.

Due to various facts - including that DATETIME field - I suggested migrating to InnoDB with a PK composed of the necessary fields. Moreover, the first field would be the DATETIME which would impose some kind of chronological order for better key insertion - at the same time, the processing was being done for specific intervals of that field.

I've sampled 1M rows of the live data for a test: I wanted to compare one method to my overall solution, which you can check in maillogsinnonew below. I took the opportunity to isolate the storage engine benefit, by running the test for the both versions of both solutions (previous/orig and new):

[mysql]
mysql> SELECT * FROM maillogs INTO OUTFILE '/export/home/gpshumano/teste/maillogsorig.sql';
Query OK, 1000000 rows affected (6.47 sec)

mysql> select ip,sender,receiver,date_time,SUBSTRING(chave,LENGTH(SUBSTRING_INDEX(chave,'|',2))+2) FROM maillogsminew INTO OUTFILE '/export/home/gpshumano/teste/maillogsnew.sql';
Query OK, 1000000 rows affected (4.42 sec)

mysql> CREATE TABLE `maillogsinnonew` (
-> `ip` varchar(16) NOT NULL,
-> `sender` varchar(320) NOT NULL,
-> `receiver` varchar(320) NOT NULL,
-> `date_time` datetime NOT NULL,
-> `message_id` varchar(200) NOT NULL,
-> PRIMARY KEY (`date_time`,`sender`,`receiver`,`message_id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (1.27 sec)

mysql> CREATE TABLE `maillogsminew` (
-> `ip` varchar(16) NOT NULL,
-> `sender` varchar(320) NOT NULL,
-> `receiver` varchar(320) NOT NULL,
-> `date_time` datetime NOT NULL,
-> `message_id` varchar(200) NOT NULL,
-> PRIMARY KEY (`date_time`,`sender`,`receiver`,`message_id`)
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.13 sec)

mysql> CREATE TABLE `maillogsmiorig` (
-> `chave` varchar(200) NOT NULL,
-> `ip` varchar(16) NOT NULL,
-> `sender` varchar(320) NOT NULL,
-> `receiver` varchar(320) NOT NULL,
-> `date_time` datetime NOT NULL,
-> PRIMARY KEY (`chave`),
-> KEY `maillogs_date_time_idx` (`date_time`)
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)

mysql> CREATE TABLE `maillogsinnoorig` (
-> `chave` varchar(200) NOT NULL,
-> `ip` varchar(16) NOT NULL,
-> `sender` varchar(320) NOT NULL,
-> `receiver` varchar(320) NOT NULL,
-> `date_time` datetime NOT NULL,
-> PRIMARY KEY (`chave`),
-> KEY `maillogs_date_time_idx` (`date_time`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (2.81 sec)

mysql> LOAD DATA INFILE '/export/home/gpshumano/teste/maillogsnew.sql' INTO TABLE maillogsminew;
Query OK, 1000000 rows affected (34.83 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql> LOAD DATA INFILE '/export/home/gpshumano/teste/maillogsnew.sql' INTO TABLE maillogsinnonew;
Query OK, 1000000 rows affected (1 min 40.56 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql> LOAD DATA INFILE '/export/home/gpshumano/teste/maillogsorig.sql' INTO TABLE maillogsinnoorig;
Query OK, 1000000 rows affected (6 min 54.14 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

mysql> LOAD DATA INFILE '/export/home/gpshumano/teste/maillogsorig.sql' INTO TABLE maillogsmiorig;
Query OK, 1000000 rows affected (1 min 17.06 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
[/mysql]

This was the aftermath. Comparing the results you'll see a 75% gain in speed for InnoDB, and a 55% in MyISAM. More, after the loadings just did an OPTIMIZE TABLE to get more precise table status:
[mysql]
mysql> show table status;
+----------------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| maillogsinnonew | InnoDB | 10 | Compact | 1048709 | 144 | 152043520 | 0 | 0 | 6144 | NULL | 2009-08-24 20:00:44 | NULL | NULL | latin1_swedish_ci | NULL | | |
| maillogsinnoorig | InnoDB | 10 | Compact | 1041388 | 392 | 408944640 | 0 | 216006656 | 4096 | NULL | 2009-08-24 20:01:35 | NULL | NULL | latin1_swedish_ci | NULL | | |
| maillogsminew | MyISAM | 10 | Dynamic | 1000000 | 126 | 126120088 | 281474976710655 | 100151296 | 0 | NULL | 2009-08-24 20:00:55 | 2009-08-24 20:06:08 | NULL | latin1_swedish_ci | NULL | | |
| maillogsmiorig | MyISAM | 10 | Dynamic | 1000000 | 173 | 173720872 | 281474976710655 | 166667264 | 0 | NULL | 2009-08-24 20:01:01 | 2009-08-24 20:18:24 | 2009-08-24 20:18:31 | latin1_swedish_ci | NULL | | |
+----------------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
4 rows in set (0.03 sec)
[/mysql]

[code]
bash-3.00# ls -lah /iscsipool/DATA/tavares/
total 2288740
drwx------ 2 mysql mysql 13 ago 24 20:37 .
drwxr-xr-x 10 mysql mysql 17 ago 24 19:45 ..
-rw-rw---- 1 mysql mysql 65 ago 24 19:45 db.opt
-rw-rw---- 1 mysql mysql 8,5K ago 24 20:29 maillogsinnonew.frm
-rw-rw---- 1 mysql mysql 156M ago 24 20:32 maillogsinnonew.ibd
-rw-rw---- 1 mysql mysql 8,5K ago 24 20:31 maillogsinnoorig.frm
-rw-rw---- 1 mysql mysql 420M ago 24 20:40 maillogsinnoorig.ibd
-rw-rw---- 1 mysql mysql 8,5K ago 24 20:00 maillogsminew.frm
-rw-rw---- 1 mysql mysql 120M ago 24 20:06 maillogsminew.MYD
-rw-rw---- 1 mysql mysql 96M ago 24 20:37 maillogsminew.MYI
-rw-rw---- 1 mysql mysql 8,5K ago 24 20:01 maillogsmiorig.frm
-rw-rw---- 1 mysql mysql 166M ago 24 20:18 maillogsmiorig.MYD
-rw-rw---- 1 mysql mysql 159M ago 24 20:37 maillogsmiorig.MYI
[/code]

As you seen, you not only gain in speed, but also in smaller data files size - 63% for InnoDB and 33% for MyISAM. This smaller sizes also contribute for a longer stay in the buffer pool and reduced I/O. In any case, since the rows were processed immediatelly after the loading, they would probably be all in InnoDB's buffer pool, avoiding the extra I/O to refetch the data from the MyISAM data files.

Also notice as maillogsinnonew considers it's index (PK) size as 0 - that's because in InnoDB the fields in the PK are actually part of the data node itself. In practical terms, this index is just a matter of data ordering, ie, no extra space is required to store the index itself!

quinta-feira, 27 de agosto de 2009

Update on mysql-query-browser "affected rows"

Yesterday I tried massaging mysql-gui-tools a bit to see if I could make affected rows show up on modification DMLs, sucha as INSERT, DELETE and UPDATE. Here is a briefing about it, along with all the patches I used, some taken from the current Fedora 10 source RPM, along a rude one by myself to show affected rows.

Update

Fixing the problem on the root will take a bit more time. This kind of protocol information, such as affected_rows is lost because the guys at MySQL considered that DML changing data never return usefull results - but actually they do: the response protocol packet comes with logs of info, like if you used an index, if you are in a transaction, etc. It could have been due to the protocol changed over time and the Query Browser didn't catchup.

This translates to a lot of fixes: adapting the methods discarding results only when variable result is NULL for not doing that, and find a way to leave affected_rows set somewhere.

So, for the moment, here is a list of patches I used, the bold are my own. The RPM will have to wait, since I wanted to release mysql-gui-tools-5.0r14 (which are the versions you should apply these patches against) instead of the r12 currently available. In the meantime, if I find more patches to submit, may be I have a real look at it. Here are the patches:

mysql-query-browser-sigcfix

mysql-query-browser-missingincludes

mysql-query-browser-gtkdeprecated

mysql-query-browser-default_schema_infinite_loop

mysql-query-browser-affected_rows

mysql-gui-common-sigcfix

mysql-gui-common-encodingfix

mysql-gui-common-affected_rows

quarta-feira, 26 de agosto de 2009

Compiling mysql-gui-tools

Me and a colleague were missing some features in mysql-query-browser and am trying to have a look at them, since no one at MySQL AB is very interested in supporting it. So I thought I could have a look at it. System is Fedora 10 (still), and I use it mainly because it's small, simple to use, and it's GTK!

Setting up the building (compile) environment

Got the sources from the notes at MySQL Forge: Building MySQL GUI Tools on Linux:

[code]
svn co http://svn.mysql.com/svnpublic/mysql-gui-common/trunk mysql-gui-common
svn co http://svn.mysql.com/svnpublic/mysql-query-browser/trunk mysql-query-browser
[/code]

You'll need a patch from Oden Eriksson attached to Bug #32184, or you can use the one from the RPM - otherwise you'll get the error error: ‘SigC’ has not been declared found on that bug report. I had to cut it for building from the SVN tree, and patched mysql-gui-common and mysql-query-browser independently (split the patch).

Building mysql-gui-common is straightforward:

[code]
./autogen.sh
./configure --prefix=/home/nmct/mysql-query-browser/fake
make -j 2
make install
[/code]

Building mysql-query-browser seems to need explicit pointing to the libgtkhtml besides the packages it mentions on error:

[code]
[root@speedy ~]# rpm -qa | grep gtkhtml | grep devel
gtkhtml3-devel-3.24.5-1.fc10.i386
[root@speedy ~]# rpm -ql gtkhtml3-devel | grep pc
/usr/lib/pkgconfig/gtkhtml-editor.pc
/usr/lib/pkgconfig/libgtkhtml-3.14.pc
[/code]

So it's easy to spot the needed --with switch. I had to apply several other patches that I just took source RPM. Most of them were applied with -p2.

[code]
[nmct@speedy mysql-query-browser]$ patch -p 2 < mysql-gui-tools-5.0_p12-libsigc++-2.2.patch
patching file source/linux/MQResultTab.h
[nmct@speedy mysql-query-browser]$ patch -p2 < mysql-gui-tools-gtksourceview-cflags.patch
patching file source/linux/Makefile.in
Hunk #1 succeeded at 119 (offset 17 lines).
[nmct@speedy mysql-query-browser]$ patch -p2 < gtk_deprecated_typedefs.patch
patching file source/linux/gtksourceview/gtksourceview/Makefile.in
...

...
./configure --with-gtkhtml=libgtkhtml-3.14 --prefix=/home/nmct/mysql-query-browser/fake
make -j 2
make install
[/code]

And that should be it - actually there was a path concatenation issue (looking for ...fake/usr/local/share...) which I quickly fixed with symlinks. After that, we should be ready to rock.

First patch: mysql_affected_rows

One of the features I miss most is the number of affected rows of some DML commands, such as UPDATE and INSERT. This was not easy to do in five minutes because of the UI split: mysql_affected_rows() doesn't seem to reach the GUI. So I've made a simple test, and succeeded.

This looks promising. I just set a global var, which will do for now. I still have to check for potential race conditions, but expect the polished patch, along with a new RPM for Fedora 10, at least, in the near future.

Percorrer uma tabela MyISAM de forma controlada

Lembrei-me de partilhar um teste que fiz há tempos. O que se pretendia era demonstrar que podemos percorrer uma tabela MyISAM (e só neste caso específico) de forma controlada (ie, prever a ordem pela qual os registos são devolvidos) e se esse percurso se mantém em outras operações para além do SELECT, como no caso do DELETE. É fácil pensar numa aplicação: se eu quiser, por exemplo, transladar blocos de registos de uma tabela para outra, torna-se fundamental que a operação DELETE também obedeça ao expectável já que, como sabemos, o MyISAM não é transaccional e, se alguma coisa falhar, queremos poder ter acesso aos registos desta forma determinística para saber o que reverter.

Para alguns pode parecer óbvio, mas sem olhar para o código nunca vamos ter a certeza. Para além disso, o código pode mudar, por isso mais vale termos a certeza :-) Não vamos sequer tentar extrapolar as conclusões para InnoDB porque internamente trabalha de forma completamente diferente. Aliás, um único aspecto da sua arquitectura - o famoso clustered index, que pode ou não ser interno, mas existe sempre! - dá logo para desconfiar que o comportamento seja completamente diferente.

Portanto, na prática, o que se pretende mesmo é ter a certeza que sabemos que registos vão surgindo em várias iterações, e se essa certeza se extrapola para DELETEs (e, eventualmente, para UPDATEs) - ie, tornar o nosso processo determinístico.

Vamos começar com uma tabela simples:

[mysql]
CREATE TABLE `teste` (
`_int` int(11) NOT NULL DEFAULT '0',
`_char` varchar(5) NOT NULL DEFAULT '',
KEY `idx_int` (`_int`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
[/mysql]

E inserimos alguns dados:
[mysql]
mysql> INSERT INTO teste VALUES (2,'c'), (1,'e'), (1,'b'), (1,'z'), (2,'b'), (2,'d'),(3,'m');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0

mysql> SELECT SQL_NO_CACHE * FROM teste;
+------+-------+
| _int | _char |
+------+-------+
| 2 | c |
| 1 | e |
| 1 | b |
| 1 | z |
| 2 | b |
| 2 | d |
| 3 | m |
+------+-------+
7 rows in set (0.00 sec)
[/mysql]

A ordem pela qual foram inseridos os registos é fundamental. Podemos observar que este table scan é feito de forma natural, também segundo o Query Optimizer:
[mysql]
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | teste | ALL | NULL | NULL | NULL | NULL | 7 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
[/mysql]

Perfeito. Estou a pedir os campos todos de cada linha, onde se inclui _char, que não é indexado, e o Optimizer comporta-se como suposto. Mas agora vejamos uma query ligeiramente diferente:
[mysql]
mysql> EXPLAIN SELECT SQL_NO_CACHE _int FROM teste;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | teste | index | NULL | idx_int | 4 | NULL | 7 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
[/mysql]

É interessante como o Optimizer reconheceu que, se eu só quero um campo que por acaso até está indexado, então posso obtê-lo directamente do índice (e é isso que significa Using index no campo Extra) evitando ir aos datafiles. Porém isto significa que iremos obter os registos pela ordem do índice:
[mysql]
mysql> SELECT SQL_NO_CACHE _int FROM teste;
+------+
| _int |
+------+
| 1 |
| 1 |
| 1 |
| 2 |
| 2 |
| 2 |
| 3 |
+------+
7 rows in set (0.00 sec)
[/mysql]

Isto é mais importante que o que possa parecer para este teste. Se eu for obrigado a requisitar mais campos do que esse, o Optimizer vai voltar ao table scan. E não basta colocar um ORDER BY...
[mysql]
mysql> explain SELECT SQL_NO_CACHE * FROM teste ORDER BY _int;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | teste | ALL | NULL | NULL | NULL | NULL | 7 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
[/mysql]
... porque o Optimizer pode querer usar uma tabela temporária para fazer a ordenação, que é que nos diz Using filesort no campo Extra. Isto pode parecer uma falha do Optimizer, mas a verdade é que o Optimizer é inteligente o suficiente para determinar que um ''full table scan'' pode ser mais eficiente que percorrer o índice por uma ordem e ir buscar os restantes dados aos data files com localização (nos discos) completamente desordenada, provavelmente aleatória (o I/O manifestar-se-ia imediatamente) - claro que nesta tabela talvez não seja o caso, mas para tabelas muito grandes pode ser desastroso. Assim sendo, teríamos que forçar explicitamente a utilização do índice, já que, pelo menos no meu caso, nem a pista USE INDEX ajudava:
[mysql]
mysql> explain SELECT SQL_NO_CACHE * FROM teste FORCE INDEX(idx_int) ORDER BY _int;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | teste | index | NULL | idx_int | 4 | NULL | 7 | |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
[/mysql]

De facto, o Optimizer é tão teimoso que mesmo forçando a utilização do índice ele descarta-o se não usarmos o ORDER BY, pois sabe que, para um table scan a ordem dos registos é indiferente e, como tal, não precisa do índice para nada. Deve haver uma explicação para este comportamento - que vou ter que pesquisar - mas este comportamento interessa-nos e muito: se o Optimizer pegasse ao acaso um índice que lhe parecesse bem, seria difícil obter os registos pela ordem natural sem testar com um EXPLAIN primeiro. Parece-me interessante salientar o seguinte:
[mysql]
mysql> SELECT SQL_NO_CACHE * FROM teste FORCE INDEX(idx_int) ORDER BY _int;
+------+-------+
| _int | _char |
+------+-------+
| 1 | e |
| 1 | b |
| 1 | z |
| 2 | c |
| 2 | b |
| 2 | d |
| 3 | m |
+------+-------+
7 rows in set (0.00 sec)
[/mysql]

Ou seja, dentro do índice, quando há colisões, elas são simplesmente adicionadas no fim. Isto significa que, após a ordenação, a ordem pela qual obtemos os registos é... a ordem natural.

Mas pronto, agora sim, podemos assumir, para já, que se percorrermos a tabela com SELECT ... LIMIT 1, podemos ir obtendo registo a registo quer pela ordem natural, quer pela ordem do índice que quisermos. Mas o meu grande problema era na remoção, pois não temos EXPLAIN para o DELETE. Qual dos dois métodos o DELETE utiliza?

[mysql]
mysql> DELETE FROM teste LIMIT 1;
Query OK, 1 row affected (0.00 sec)

mysql> SELECT SQL_NO_CACHE * FROM teste;
+------+-------+
| _int | _char |
+------+-------+
| 1 | e |
| 1 | b |
| 1 | z |
| 2 | b |
| 2 | d |
| 3 | m |
+------+-------+
6 rows in set (0.00 sec)
[/mysql]

Bem, para já, parece ser a ordem natural. Claro que se eu especificar um ORDER BY _int o próximo registo a apagar deveria ser (1,e) - porque é o primeiro no índice idx_int, e sabemos nós que o valor no campo _char será o da ordem natural - resta saber se o Optimizer não pensa que precisa duma tabela temporária para essa ordenação. Eu estou convencido que não, pois como não há selecção de nenhum campo específico, não há porque não utilizar o índice idx_int;. Vamos só confirmar:
[mysql]
mysql> DELETE FROM teste ORDER BY _int LIMIT 1;
Query OK, 1 row affected (0.00 sec)

mysql> SELECT SQL_NO_CACHE * FROM teste;
+------+-------+
| _int | _char |
+------+-------+
| 1 | b |
| 1 | z |
| 2 | b |
| 2 | d |
| 3 | m |
+------+-------+
5 rows in set (0.00 sec)
[/mysql]

Tudo bem, conforme previsto. Mas há mais. Raramente há tabelas só com um índice e pode acontecer que o nosso campo _char fosse apanhado por um índice, o que tornaria as coisas um pouco diferentes:
[mysql]
mysql> ALTER TABLE teste ADD KEY idx_char(_char);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | teste | ALL | NULL | NULL | NULL | NULL | 5 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
[/mysql]

Acho interessante porque é como se o Optimizer «em caso de dúvida, optasse por nenhum», ou seja, como desconhece um critério para escolher um ou outro índice, não usa nenhum.

Na verdade, o Optimizer não usa nenhum índice porque não tem nenhuma pista sobre qual agarrar. Por norma irá utilizar o de maior cardinalidade, para as pistas que tiver disponíveis:

[mysql]
mysql> OPTIMIZE TABLE teste;
+------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text
+------------+----------+----------+----------+
| test.teste | OPTIMIZE | status | OK
+------------+----------+----------+----------+
1 row in SET (0.00 sec)

mysql> SHOW INDEXES FROM teste;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| teste | 1 | idx_int | 1 | _int | A | 2 | NULL | NULL | | BTREE |
| teste | 1 | idx_char | 1 | _char | A | 5 | NULL | NULL | | BTREE |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in SET (0.00 sec)

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste WHERE _int = 1 AND _char = 'e';
+----+-------------+-------+------+------------------+----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+------+------------------+----------+---------+-------+------+-------------+
| 1 | SIMPLE | teste | ref | idx_int,idx_char | idx_char | 7 | const | 1 | USING WHERE
+----+-------------+-------+------+------------------+----------+---------+-------+------+-------------+
1 row in SET (0.00 sec)
[/mysql]

Ou seja, idx_char é utilizado para a filtragem e, como tinha potencial para filtrar mais registos, é esse índice o escolhido pelo Optimizer, que nos diz ainda que vai precisar de percorrer os datafiles para filtrar o campo idx_int (Using where).

Eu sei que o * é expandido para os campos todos pelo Optimizer; então e se houvesse um covering index?

[mysql]
mysql> ALTER TABLE teste ADD KEY idx_int_char(_int,_char);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 WARNINGS: 0

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste;
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| 1 | SIMPLE | teste | index | NULL | idx_int_char | 11 | NULL | 5 | USING index
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
1 row in SET (0.00 sec)
[/mysql]

Como era de esperar, o índice será usado. Vamos ver o que acontece com o DELETE. Para testar, forçamos os registos a mudar a posição no índice (mantendo a ordem natural) e vamos criar outra tabela similar, com os mesmos registos, porque vai-nos interessar esta ordem mais abaixo:

[mysql]
mysql> UPDATE teste SET _int = 4 WHERE _int = 1 AND _char = 'b'; UPDATE teste SET _int = 5, _char = 'f' WHERE _int = 2 AND _char = 'b';

mysql> create table teste3 like teste;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into teste3 select * from teste IGNORE INDEX(idx_int_char);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> select * from teste3 IGNORE INDEX(idx_int_char);
+------+-------+
| _int | _char |
+------+-------+
| 4 | b |
| 1 | z |
| 5 | f |
| 2 | d |
| 3 | m |
+------+-------+
5 rows in set (0.00 sec)

mysql> DELETE FROM teste3 LIMIT 1;
Query OK, 1 row affected (0.00 sec)

mysql> select * from teste3 IGNORE INDEX(idx_int_char);
+------+-------+
| _int | _char |
+------+-------+
| 1 | z |
| 5 | f |
| 2 | d |
| 3 | m |
+------+-------+
4 rows in set (0.00 sec)
[/mysql]

Ou seja, com o SELECT antes e depois temos oportunidade de comprovar que o DELETE não pegou nenhum índice, nem mesmo o covering index.

Mas voltando atrás, à tabela teste, será que o facto de se usar uma PRIMARY KEY (que é única e não nula) influencia os resultados? Esta pergunta provavelmente só será pertinente para quem conheça o InnoDB.

[mysql]
mysql> ALTER TABLE teste DROP KEY idx_int;
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE teste ADD PRIMARY KEY(_int);
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | teste | ALL | NULL | NULL | NULL | NULL | 5 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
[/mysql]

Muito bem, nada a assinalar.

Mas nós também sabemos o que acontece com o MyISAM: após os DELETEs começam a surgir buracos nos data files (que, consequentemente, acabam por interferir com as inserções em concorrência). Vejamos. Recriei a tabela teste com o dataset original, e fiz os dois DELETE que fizémos até aqui. A seguir:

[mysql]
mysql> SELECT @@concurrent_insert;
+---------------------+
| @@concurrent_insert
+---------------------+
| 1
+---------------------+
1 row in SET (0.00 sec)

mysql> INSERT INTO teste VALUES (6,'e'),(5,'f'),(4,'g');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 WARNINGS: 0

mysql> SELECT SQL_NO_CACHE * FROM teste;
+------+-------+
| _int | _char |
+------+-------+
| 5 | f |
| 6 | e |
| 1 | b |
| 1 | z |
| 2 | b |
| 2 | d |
| 3 | m |
| 4 | g |
+------+-------+
8 rows in SET (0.00 sec)
[/mysql]

Repare-se que os dois primeiros registos ficaram fora da ordem natural pois o MyISAM conseguiu reciclar o espaço livre, exactamente nos mesmos sítios. O terceiro elemento calhou no único sítio onde cabia: no fim. Esta também é uma observação importante porque se houverem escritas na tabela durante as operações, então é preciso um cuidado especial. Este cuidado agrava-se pelo facto de nem conseguirmos desactivar este comportamento. Recrie-se novamente a tabela teste, com o INSERT inicial, depois os dois DELETEs feitos até agora, e depois o INSERT do teste anterior:

[mysql]
mysql> SET global concurrent_insert = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO teste VALUES (6,'e'),(5,'f'),(4,'g');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 WARNINGS: 0

mysql> SELECT SQL_NO_CACHE * FROM teste;
+------+-------+
| _int | _char |
+------+-------+
| 5 | f |
| 6 | e |
| 1 | b |
| 1 | z |
| 2 | b |
| 2 | d |
| 3 | m |
| 4 | g |
+------+-------+
8 rows in SET (0.00 sec)
[/mysql]

Exactamente o mesmo. Perde-se, portanto, a ordem natural.

Nota sobre o InnoDB

A título de nota, deixa-se uma breve explicação do porquê de não analisarmos o caso do InnoDB:

[mysql]
mysql> CREATE TABLE teste2 like teste;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO teste2 SELECT * FROM teste;
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 WARNINGS: 0

mysql> UPDATE teste2 SET _char = 'k' WHERE _int = 1 and _char = 'b';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 WARNINGS: 0

mysql> ALTER TABLE teste2 engine=INNODB, DROP INDEX idx_int, DROP INDEX idx_char, DROP INDEX idx_int_char;
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 WARNINGS: 0

mysql> alter table teste2 add primary key (_char);
Query OK, 5 rows affected (0.02 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM teste2;
+----+-------------+--------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+--------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | teste2 | ALL | NULL | NULL | NULL | NULL | 5 |
+----+-------------+--------+------+---------------+------+---------+------+------+-------+
1 row in SET (0.00 sec)

mysql> SELECT * FROM teste2;
+------+-------+
| _int | _char |
+------+-------+
| 2 | b |
| 2 | d |
| 1 | k |
| 3 | m |
| 1 | z |
+------+-------+
5 rows in SET (0.00 sec)
[/mysql]

O que queria que reparassem é que não há indicação de estar a ser utilizada nenhuma chave e, no entanto, os registos vêm ordenados pela PRIMARY KEY. Como disse logo no início, com InnoDB os resultados seriam diferentes, e isto deve-se ao clustered index deste storage engine, que armazena os registos (fiscamente nos datafiles) pela ordem da chave (a famosa primary key order). Como consequência, não é possível ter acesso aos registos na forma natural (ie, pela ordem em que foram inseridos) - e, de forma consistente com o que tínhamos visto para MyISAM, o Optimizer prefere não usar nenhum índice, passando para o storage engine a mesma query expandida.

Penso que consegui demonstrar com alguma segurança que podemos obter registos com SELECT de uma forma controlada. Dependendo do cenário, nem sempre se torna possível usar um ou outro método (também pode não ser possível criar um novo índice, por exemplo) pelo que antes de tirar conclusões precipitadas, o melhor é usar o EXPLAIN para explorar as possibilidades!