日期:2014-05-16  浏览次数:20664 次

关于postgresql索引的问题
各位大牛,
  小弟现在有两张表,
  Table "public.commit_bugs"
 Column | Type | Modifiers
---------+-----------------------+----------------------------------------
 bug_id | integer | not null default 0
 log_md5 | character varying(32) | not null default ''::character varying
Indexes:
  "commit_bugs_bug_id_idx" btree (bug_id, log_md5)

  Table "public.commit_logs"
  Column | Type | Modifiers
-----------+-----------------------------+--------------------------
 log_id | integer | not null default nextval('commit_logs_log_id_seq'::regclass)
 start_log | bigint | not null default (0)::bigint
 date | timestamp without time zone | not null
 user_name | character varying(255) | not null default ''::character varying
 log_md5 | character varying(32) | not null default ''::character varying
Indexes:
  "commit_logs_pkey" PRIMARY KEY, btree (log_id)
  "commit_log_date_idx" btree (date)
  "commit_log_md5_idx" btree (log_md5)
  "commit_logs_start_log_idx" hash (start_log) CLUSTER

第一张表大概5w的数据,第二张表150w的数据
下面的查询
select cb.bug_id,cl.log_md5 from commit_logs cl, commit_bugs cb where cb.log_md5=cl.log_md5 and cb.bug_id=205935;居然没有使用这个索引commit_log_md5_idx" btree (log_md5),导致查询的速度非常慢,大概要12s的时间。
下面是explain查看的查询规划:
  QUERY PLAN
---------------------------------------------------------------------
 Hash Join (cost=364.41..136125.63 rows=1653113 width=38) (actual time=12089.641..12268.713 rows=2 loops=1)
  Hash Cond: (("outer".log_md5)::text = ("inner".log_md5)::text)
  -> Seq Scan on commit_logs cl (cost=0.00..31840.87 rows=1456487 width=34) (actual time=0.045..6469.226 rows=1456487 loops=1)
  -> Hash (cost=363.84..363.84 rows=227 width=38) (actual time=0.239..0.239 rows=1 loops=1)
  -> Bitmap Heap Scan on commit_bugs cb (cost=4.80..363.84 rows=227 width=38) (actual time=0.206..0.212 rows=1 loops=1)
  Recheck Cond: (bug_id = 205935)
  -> Bitmap Index Scan on commit_bugs_bug_id_idx (cost=0.00..4.80 rows=227 width=0) (actual time=0.154..0.154 rows=1 loops=1)
  Index Cond: (bug_id = 205935)
 Total runtime: 12268.843 ms


可是我的另外一个postgresql服务器上,完全相同的数据库和表(数据量不同),同样的查询却使用了索引,所以只需要0.9ms
下面是另外一个服务器上的同样的查询的查询规划,
  QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop (cost=3.79..4149.56 rows=538 width=40) (actual time=0.710..0.846 rows=2 loops=1)
  -> Index Scan using commit_bugs_bug_id_idx on commit_bugs cb (cost=0.00..19.20 rows=5 width=40) (actual time=0.085..0.097 rows=1 loops=1)
  Index Cond: (bug_id = 205935)
  -> Bitmap Heap Scan on commit_logs cl (cost=3.79..823.26 rows=225 width=36) (actual time=0.596..0.698 rows=2 loops=1)
  Recheck Cond: (("outer".log_md5)::text = (cl.log_md5)::text)
  -> Bitmap Index Scan on commit_log_md5_idx (cost=0.00..3.79 rows=225 width=0) (actual time=0.498..0.498 rows=2 loops=1)
  Index Cond: (("outer".log_md5)::text = (cl.log_md5)::text)
 Total runtime: 0.961 ms