Mlam main page

SourceForge Logo

Mlam and its cache strategy
A little introduction
This page deals with a little technique very useful to make a webpage faster to generate, simply by generating it once (with all the database accesses needed), saving the result on the server, and then returning the saved file for each other access, until something changes on the page. Even it may sounds easy like this, some problems are inherent to this strategy and two questions are obvious :
- what if (for instance) I want a counter on my page who makes it different each time it is accessed ?
- is there a way to really know that the content of my page has changed ?
Mlam offers an answer for these two questions, and offers also a complete optimized method for managing its cache method.

A wonderful sample
To explain precisely mlam' strategy, here we go with a beautiful page from a surely beautiful site :
The beautiful title (1)
(2)Hello, Mr. beautiful

There are 3 beautiful members
and 8 beautiful but anonymous people here
(3)Beautiful article (read 18 times)

Excellent content of this beautiful article showing incredible things

Some comments : the title (1) is obviously something static, we don't need to cache anything here. The left column (2) is typically some sort of "each-time-different" content (personalization and statistics). The third column may be interesting to cache, even if we have to increment the read-counter each time someones accesses this article. the best strategy may be to cache only the content of the article (which may be heavy to generate). That's what mlam allows you to implement quite easily.

How to initialise the cache in mlam?
In order to have make the cache from mlam working, you need a specific table in the database edited with mlam and a directory where your web server can write and create directories. The script to create the database is :

CREATE TABLE gestion_cache (
id bigint(20) DEFAULT '0' NOT NULL auto_increment,
etat tinyint(4) DEFAULT '0' NOT NULL,
url tinytext NOT NULL,
url_complete mediumtext NOT NULL,
hash_url int(6) DEFAULT '0' NOT NULL,
hash_var int(6) DEFAULT '0' NOT NULL,
date_modif date,
PRIMARY KEY (id),
KEY id (id),
KEY hash_url (hash_url),
KEY hash_var (hash_var)
);


How to cache a page portion ?
You need to create a php page generating the portion you want to cache. In the sample above, we want to generate only the content of the article. So we will create something called for instance content_article.php. This page will typically have one parameter, the id of the article. So we will write a php code using this variable to generate the correct content. This content won't contain the html header and body elements (in this sample), even it may contain any type of html content.
Then let's go to the main php code of the current code. We will write all the useful stull for the left column ant the title of the article, and now we arrive at our portion we want to cache. There are max three lines to write :

$base=connecte_base("your_base",1); (1)
$objet_cache = new CacheManager("cache_directory","site_main_url"); (2)
print $objet_cache->GetUrl("dir/content_article","id=$id"); (3)

(1) we use the internal mlam database connexion process (just to avoid rewriting one)
(2) we instantiate the cache manager with two values :
cache_directory is your local directory which will contain the files cached (something like "/www/cache/", don't forget the final /)
site_main_url is your site main url. Something like http://www.yoursite.com/
(3) we call the cache manager to insert here the content of the article. The function GetUrl takes two arguments : the url path (what comes after http://www.yoursite.com/) and the parameters.
With these simple lines, mlam will verify if this page is cached. If not, it will generate it, save the content on disk and return the content. If yes, it will directly return the content in an efficient way.

How to decache a page portion ?
The cache manager offers a simple function to decache a portion which works like this :
$base=connecte_base("your_base",1); (1)
$objet_cache = new CacheManager("cache_directory","site_main_url"); (2)
$objet_cache->ClearCache("site/dir/content_article","id=$id",1); (3)

(1) we use the internal mlam database connexion process (just to avoid rewriting one)
(2) we instantiate the cache manager
(3) the ClearCache function takes three arguments. The first and the second one identify the url to clear. The third one allows to specify if we want to clear the file on disk (1) or just to tell mlam that this url will have to be generated at next call (0), hence rewriting the existing file.

Why is mlam useful for clearing process ?
Each cache portion is dependant of some database data. This is when the data are changed (typically from an administration tool), that the cache must be cleared effectively (there's no need to clear all the site if only one page is changed). In our sample, this is obvious that we must clear the cache for the article whose id is 10 each time the content of the article is changed in the administration tool.
Mlam comes with a file containing a function called each time mlam updates something in a database. This function is located in mlam_cache.php (file located in the same directory as mlam_tables_def.php). The developper must create the content of this function for implementing the cache administration, mlam ensures that this is the only place to do this work. This function is
function test_decache($table,$wherechamps,$wherevalues,$array_prepare)
$table contains the name of the table changed.
$wherechamps and $wherevalues are two tables giving the elements used to identify the data changed. $wherechamps contains the list of the fields, and $wherevalues the corresponding id(s).
$array_prepare may contain specific values (see later). This value can be empty.
In our article sample, when mlam changes the content of the article with the id 10, the values will be :
$table="article_table"
$wherechamps=["article_id"]
$wherevalues=[10]
$array_prepare=""
So this is quite easy to test the value of $table, waiting for changes in the table article_table, and then clearing the corresponding portion with a line like :
$objet_cache->ClearCache("site/dir/content_article","id=".$wherevalues[0],1);

Moreover, mlam offers the possibility to trace some data in an update. This way, this is possible to be very accurate when choosing the pages to clear after an update. This function is called prepare_decache. It is called just before an update in the database. The variables sent to this function are the name of the table, the field name and the values used in the "where" update. If this function returns an array (whatever it is), the content of this array will be available as a variable in the function test_decache after the update (in prepare_array). Typically, it allows to look at the value of one (or more) field before the update, save these values in an array and return this array. After the update, we have the opportunity to check if these values have changed after the update. It may be very interesting to decide to clear some pages or not.

But mlam doesn't make all the changes in the tables ?
It may, mainly thanks to a function meant to simplify the insertions and updates in database. The only to insert data in database (except for developpers using their own functions) is something like that :
$insert = mysql_query("insert into table (col1,col2,col3) values (value1,value2,value3)");
Even if this function is quite simple, it becomes easily a little confuse when it involves variables and texts, mainly because wrongly using " and ' leads to errors (this was my case). To make an insertion, Mlam offers this method :
$insert = mlam_table_change("table",array("col1"=>"value1","col2"=>"value2","col3"=>"value3"));

You d'ont have anymore to worry about the type of your variables, Mlam handles it for you.
For updating your values, instead of writing :
$update = mysql_query("update table set col1=value1,col2=value2,col3=value3 where col4=value4 and col5=value=5");
You can use now :
$insert = mlam_table_change("table",array("col1"=>"value1","col2"=>"value2","col3"=>"value3"),array("col4"=>"value4","col5"=>"value5"));

This function is quite easy to use, but it calls the cache functions in mlam allowing you to test your values.

Isn't it a little bit trivial ?
Maybe :)
This cache method is not really a revolution, but it works very efficiently, principally because I haven't yet given to you all the options of the various functions of the cache manager, so here we go :

Multi variable cache : In the samples above, the cache portion was working with just one value as parameter, but the function GetUrl accepts things like "var1=something&var2=beautiful&var3=is&var4=happening". Moreover, "var4=happening&var3=is&var1=something&var2=beautiful" is the same thing for Mlam. The variables are the same, the order is not important.
Real partial cache clearing : Mlam has a function ClearPartialCache. If we consider one url called once with the parameters "var1=10&var2=5" and once with "var1=10&var2=6", when calling ClearPartialCache with only "var1=10", the two url will be cleared. If ClearPartialCache is called with "var2=5", only the first url will be cleared.
Global cache clearing : A specific function ClearAll doesn't deal with url parameters. It only accepts 2 parameters, the url name, and the option to really delete the files from the server. All the corresponding url cached, whatever their parameters, will be cleared.
Avoiding oversized cache : Mlam offers finally an easy way to clear old cache. The function ClearDate takes only one parameter, a number (x). All pages put in the cache more than x months ago will be cleared when calling these functions. Because this function is intended to avoid the generation of too many unuseful files, there are no options to indicate to delete the file. They will be deleted !

Have you tested the performance of mlam's cache ?
A little bit :=)
From now on, each time mlam is called on a cache portion, some informations are inserted in the result (in html commentaries). When the portion is created, mlam writes the time used for all the creation process (creation, file saving, and database notification), with the time consumed just for getting the data to save. The result looks something like :
<!-- Mlam 1.0 cache generation : 0.080754 sec (of which 0.079754 sec generating the content) -->
It gives an idea of the time "lost" to save the cache instead not using it.
When the portion is already present, mlam ads the time used for getting the data (it means looking in database to know if it is present, then getting the data). The result then looks like :
<!-- Mlam 1.0 cache reading : 0.00096399999999996 sec -->
<!-- Mlam 1.0 cache generation : 0.080754 sec (of which 0.079754 sec generating the content) -->

In the present sample, it means here that the server needs something like 0.08 secondes to generate my page. Saving the content in a file and updating the database adds something like 0.001 secondes. Then, the content is available in 0.001 secondes for the other users, compared to 0.08 secondes, the ratio is simply 80.
So you are now able to make an opinion about the quality of mlam cache system.
By the way, the above samples are directly from www.humano.com, a website I developped using mlam's cache. You're welcome to verify these values just by looking at the sourcecode of the pages :)

A little explanation ?
When the function GetUrl is called, only one sql query is made in the database, searching only for numbers in rows with indexation activated. The result of this query is meant to be unique, containing no more than 255 characters (server limitation). Then a file is opened and read once to be printed. That's all. I know that the sample above doesn't vary much from what's happening here, but it was meant to be easy to understand the main ideas behind mlam's cache. So compare with all the data you're looking in your databases when creating your content and remember that it's more everything-consuming to search for data in database than in an unique file to make your idea of this cache method.

How does it work precisely ?
Let's consider this sample (but this time a true one used while developping mlam) :
$objet_cache->GetUrl("jdr/pages/cache_visuplanete","id=$id");
GetUrl works like this :
1/ call a function to range the parameters in alphanumeric order (to allow any order)
2/ look in database for data where hash_url=1308534935 and hash_var=327306477
1308534935 is a number generated by the php function crc32 using the string "jdr/pages/cache_visuplanete"
327306477 is a number generated by the php function crc32 using the string "id=1"
3/ if nothing is found, insert in database the values url="jdr/pages/cache_visuplanete", url_complete=jdr/pages/cache_visuplanete?id=1, hash_url=1308534935, hash_var=327306477 ,etat=1, date=now, and then call one more time GetUrl with the same parameters.
4/ is more than one occurence is found, look for the correct one by comparing the strings (crc32 may give the same result with different string, even if it is quite unlikely)
5/ if etat=1, we open the url, take the content, save it in a file named jdr_pages_cache_visuplaneteid=1 , update the table to put etat=2 and return the content. If etat=2, we read the same file and return the content.

That's thanks to the crc32 function that all the searching procedures are greatly optimized (it is by far faster to look for numbers than strings). I won't detail here how work the clearing functions, because there's nothing very specific (we're still using the crc32 codes). The only thing to add is that keeping two data for each url (with or without parameters), is very useful for the ClearPartialCache function to access easily to all the url and then look for the presence of the desired parameters to clear the correct urls.

That's it for the cache strategy in mlam. Don't hesitate to tell my what you think of it or ask me questions if you think that this documentation was lacking something, you're always welcome.
Marc Hugon Signing off !