Mon Jan 17 12:08:44 2000  Loic Dachary  <loic@ceic.com>

	* webbase-5.7 release
	
	* man/crawler.1: Add hook options documentation and -- terminator
	  warning.

Fri Jan 14 13:45:55 2000  Loic Dachary  <loic@ceic.com>

	* hooks/WebbaseHookMifluz.cc (WebbaseHookMifluz): config a pointer, allocated
	  from WordContext::Initialize from ~/.mifluz

Thu Jan 13 12:28:02 2000  Loic Dachary  <loic@ceic.com>

	* hooks/WebbaseHookMifluz.{cc,h}: wordRef now a pointer because
	  wordkey has to be defined by config before we can allocate one.

Tue Jan 11 13:10:52 2000  Loic Dachary  <loic@ceic.com>

	* acinclude.m4 (CHECK_MIFLUZ): use htdb instead of db

Wed Jan 05 11:14:45 2000  Marcel Bosc  <bosc@ceic.com>

	* hooks/*: Adapted to changes in mifluz key mechanism
	(as of mifluz-0.11)

Tue Jan 04 14:01:31 2000  Loic Dachary  <loic@ceic.com>

	* hooks/*: restructuration. WebbaseHook.cc is a base class
	  for indexers. WebbaseHookMifluz.cc is a derived class that
	  implements the interface to mifluz-0.10. 

	* bin/*.cc,hooks/*.cc: use "-" in getopt_long to prevent reordering
	  of options. It does not solve everything. The end of options is
	  not found anymore.
	
	* crawler/html_content.lxx: fix redundant verbose + incorrect
	  fill_href prototype

	* crawler/*.cc, bin/*.cc: unknown options do not trigger errors
	  this is annoying but dramatically simplify implementation of
	  options handling. No cascade or option collection is necessary,
	  each module handles options it recognizes.

	* crawler/*.c : convert to C++, to start migration to C++

	* tools/WebbaseGetopt.cc: base class for options specific to a class.

	* tools/WebbaseDl.cc: encapsulate dynamic loading using libtool libltdl
	  for webbase purposes.

Wed Dec 15 11:37:14 1999  Loic Dachary  <loic@ceic.com>

	* webbase-5.6 release

	* crawler/crawl.c: fix major bug: file deleted from WLROOT
	  if Not Modified. 

	* test/*: use htdump instead of db_dump
	
	* hooks/hooks_mifluz: hardwired compression + cache +
	  page_size. Removed unecessary extern C.
	
	* check/test_functions.in,check/config: uses .my.cnf
	  for permissions. Not needed to patch config to run
	  tests anymore.
	
	* {tools,webbase}/*.h: add extern "C" everywhere

	* tools/md5.h: add #ifndef _md5_h

	* tools/salloc.h: remove include malloc.h

	* {bin,check}/*.c -> *.cc: main progs are C++

Wed Dec 15 11:18:38 1999  Loic Dachary  <loic@ceic.com>

	* configure.in: change LANG to C++

Thu Dec 09 17:27:16 1999  Loic Dachary  <loic@ceic.com>

	* acinclude.m4: upgraded CHECK_ZLIB

	* acinclude.m4 (AC_PROG_APACHE): documentation

Tue Dec 07 12:08:35 1999  Loic Dachary  <loic@ceic.com>

	* webbase-5.5 release

Tue Nov 30 12:08:35 1999  Loic Dachary  <loic@ceic.com>

	* crawler/html_content.l: test null in parse_print, fix
	  array bound write

Mon Nov 29 19:19:49 1999  Loic Dachary  <loic@ceic.com>

	* webbase-5.4 release

	* check/*: find in $srcdir
	
	* crawler/Makefile.am: added html_parser.h

	* Makefile.am: added .version

	* check/index_test (samples): indexed is non accented

	* tools/isomap.c (unaccent): added string_length argument

Fri Nov 26 10:59:21 1999  Loic Dachary  <loic@ceic.com>

	* webbase-5.3 release

	* check/* : include apache detection, autodetect modules

Thu Nov 25 19:35:00 1999  Loic Dachary  <loic@ceic.com>

	* crawler/html_*: complete rewrite of the html parser

	* hooks/*: isolate hooks in separate library

	* check/*: more tests for html parser

Wed Nov 10 17:18:20 1999  Quiedeville Rodolphe  <rodo@banquise.ceic.com>

	* man/crawler.1:  -create option : Exclusive, no other option accepted.

Tue Nov 02 16:21:45 1999  Loic Dachary  <loic@ceic.com>

	* bin/furi2md5.c: convert FURI to FURI_MD5 (see uri(3))

Fri Oct 29 15:39:17 1999  Loic Dachary  <loic@ceic.com>

	* crawler/robots.c (robots_load_1): netloc now is a unique key, added rowid
	  to get a unique identifier per server. Handle the race conditions when
	  two process try to insert the same robots entry.
	
Fri Oct 29 11:13:46 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webbase_url.c (webbase_url_start_ok): only cannonical
	  and absolute url are valid starting points.

Thu Oct 28 16:39:42 1999  Loic Dachary  <loic@ceic.com>

	* crawler/crawl.c (mirror_schedule): if delay <= 0, default to 1 week.

Thu Oct 28 15:27:21 1999  Loic Dachary  <loic@ceic.com>

	* bin/consistentc.c (fix_keys): implement -keys_url, -keys_md5, -keys_normalize

Thu Oct 28 09:23:41 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webbase.c (webbase_unlock): uses md5 key instead of long
	  ascii names.

Wed Oct 27 16:31:54 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webbase.c (webbase_insert_url): fix big problem with
	  realloc(&p, &s, s + value) changed to realloc(&p, &s, value).

Tue Oct 26 18:48:00 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webtools.c: implement -webtools_limit to limit the maximum size
	  of a document.

Tue Oct 26 16:01:22 1999  Loic Dachary  <loic@ceic.com>

	* bin/consistentc.c: consistentc -key cannonicalize urls

Fri Oct 22 13:58:03 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webbase.c: use mysql_real_connect instead of deprectated
	  mysql_connect.

	* crawler/webbase.c: read defaults from ~/.my.cnf if options are missing

	* crawler/webbase.c: do not try to connect twice

	* crawler/webbase_create.c: add bz2 and wdz extensions to unknown mime
	  type

	* bin/crawler.c (init): added -schema to print default database schema

Thu Oct 21 18:57:34 1999  Loic Dachary  <loic@ceic.com>

	* port to freebsd-3.3
	
	* crawler/crawl.c,webtools.c: conditionaly use ETIME, prefer ETIMEDOUT

Thu Oct 21 18:19:14 1999  Loic Dachary  <loic@ceic.com>

	* check/index_test: created

Tue Oct 19 19:00:14 1999  Loic Dachary  <loic@ceic.com>

	* crawler/hook_mifluz.cc: initial version

	* configure.in : --with-mifluz implementation

Mon Oct 18 17:55:46 1999  Loic Dachary  <loic@ceic.com>

	* test/webbase_test: feed url_md5 + call consistentc -key
	  when manually inserting urls in start.

Fri Oct 15 10:31:34 1999  Loic Dachary  <loic@ceic.com>

	* crawler/webbase_url.c: add webbase_url_free and call
	  on context.webbase_url objects.

	* crawler/webbase.c: add webbase_start_free and call
	  on start objects.

Thu Oct 14 17:24:52 1999  Loic Dachary  <loic@ceic.com>

	* DEBUGGING: create

	* tool/getopt*: Upgraded

	* fix various warnings reported by purify.

	* crawler/webbase*.c: fix memory leak : do not reset
	  w_*_length to 0 in *_reset.

	* added .cvsignore everywhere

Thu Oct 14 11:04:41 1999  Loic Dachary  <loic@ceic.com>

	* bin/consistentc: added -keys that rebuilds all the url_md5 keys in start and
	  url tables.

Wed Oct 13 09:36:23 1999  Loic Dachary  <loic@ceic.com>

	* crawler: add url_md5 field in start and url tables. Modify all
	  sources to fill and use this field instead of url.

	* tools/md5str.[ch]: create

	* configure.in: cleanup add link to mifluz 

1999-07-30  Bertrand Demiddelaer <bert@ceic.com>

	* crawler/webtools.c (webtools_open_1): timeout for connect() added

Mon Jul 19 14:33:25 1999    <loic@ceic.com>

	* webbase-5.2 release

1999-07-17  Loic Dachary  <loic@ceic.com>

	* crawler/webbase.c (webbase_alloc): break if connection successfull

	* crawler/dirsel.c (hnode_free): strdup key to prevent unexpected
	  deallocation

	* check: test suite

1999-07-15  Loic Dachary  <loic@ceic.com>

	* tools/dirname.[ch]: rename to urldirname to prevent conflict

1999-07-13  Loic DACHARY  <loic@home.ceic.com>

	* webbase-5.1 release

1999-07-09  Loic Dachary  <loic@ceic.com>

	* Initial import
