mybatis 的缓存是怎么回事

Java 真的是任何一个中间件,比较常用的那种,都有很多内容值得深挖,比如这个缓存,慢慢有过一些感悟,比如如何提升性能,缓存无疑是一大重要手段,最底层开始 CPU 就有缓存,而且又小又贵,再往上一点内存一般作为硬盘存储在运行时的存储,一般在代码里也会用内存作为一些本地缓存,譬如数据库,像 mysql 这种也是有innodb_buffer_pool来提升查询效率,本质上理解就是用更快的存储作为相对慢存储的缓存,减少查询直接访问较慢的存储,并且这个都是相对的,比起 cpu 的缓存,那内存也是渣,但是与普通机械硬盘相比,那也是两个次元的水平。

闲扯这么多来说说 mybatis 的缓存,mybatis 一般作为一个轻量级的 orm 使用,相对应的就是比较重量级的 hibernate,不过不在这次讨论范围,上一次是主要讲了 mybatis 在解析 sql 过程中,对于两种占位符的不同替换实现策略,这次主要聊下 mybatis 的缓存,前面其实得了解下前置的东西,比如 sqlsession,先当做我们知道 sqlsession 是个什么玩意,可能或多或少的知道 mybatis 是有两级缓存,

一级缓存

第一级的缓存是在 BaseExecutor 中的 PerpetualCache,它是个最基本的缓存实现类,使用了 HashMap 实现缓存功能,代码其实没几十行

public class PerpetualCache implements Cache {

  private final String id;

  private final Map<Object, Object> cache = new HashMap<>();

  public PerpetualCache(String id) {
    this.id = id;
  }

  @Override
  public String getId() {
    return id;
  }

  @Override
  public int getSize() {
    return cache.size();
  }

  @Override
  public void putObject(Object key, Object value) {
    cache.put(key, value);
  }

  @Override
  public Object getObject(Object key) {
    return cache.get(key);
  }

  @Override
  public Object removeObject(Object key) {
    return cache.remove(key);
  }

  @Override
  public void clear() {
    cache.clear();
  }

  @Override
  public boolean equals(Object o) {
    if (getId() == null) {
      throw new CacheException("Cache instances require an ID.");
    }
    if (this == o) {
      return true;
    }
    if (!(o instanceof Cache)) {
      return false;
    }

    Cache otherCache = (Cache) o;
    return getId().equals(otherCache.getId());
  }

  @Override
  public int hashCode() {
    if (getId() == null) {
      throw new CacheException("Cache instances require an ID.");
    }
    return getId().hashCode();
  }

}

可以看一下BaseExecutor 的构造函数

protected BaseExecutor(Configuration configuration, Transaction transaction) {
    this.transaction = transaction;
    this.deferredLoads = new ConcurrentLinkedQueue<>();
    this.localCache = new PerpetualCache("LocalCache");
    this.localOutputParameterCache = new PerpetualCache("LocalOutputParameterCache");
    this.closed = false;
    this.configuration = configuration;
    this.wrapper = this;
  }

就是把 PerpetualCache 作为 localCache,然后怎么使用我看简单看一下,BaseExecutor 的查询首先是调用这个函数

@Override
  public <E> List<E> query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler) throws SQLException {
    BoundSql boundSql = ms.getBoundSql(parameter);
    CacheKey key = createCacheKey(ms, parameter, rowBounds, boundSql);
    return query(ms, parameter, rowBounds, resultHandler, key, boundSql);
  }

可以看到首先是调用了 createCacheKey 方法,这个方法呢,先不看怎么写的,如果我们自己要实现这么个缓存,首先这个缓存 key 的设计也是个问题,如果是以表名加主键作为 key,那么分页查询,或者没有主键的时候就不行,来看看 mybatis 是怎么设计的

@Override
  public CacheKey createCacheKey(MappedStatement ms, Object parameterObject, RowBounds rowBounds, BoundSql boundSql) {
    if (closed) {
      throw new ExecutorException("Executor was closed.");
    }
    CacheKey cacheKey = new CacheKey();
    cacheKey.update(ms.getId());
    cacheKey.update(rowBounds.getOffset());
    cacheKey.update(rowBounds.getLimit());
    cacheKey.update(boundSql.getSql());
    List<ParameterMapping> parameterMappings = boundSql.getParameterMappings();
    TypeHandlerRegistry typeHandlerRegistry = ms.getConfiguration().getTypeHandlerRegistry();
    // mimic DefaultParameterHandler logic
    for (ParameterMapping parameterMapping : parameterMappings) {
      if (parameterMapping.getMode() != ParameterMode.OUT) {
        Object value;
        String propertyName = parameterMapping.getProperty();
        if (boundSql.hasAdditionalParameter(propertyName)) {
          value = boundSql.getAdditionalParameter(propertyName);
        } else if (parameterObject == null) {
          value = null;
        } else if (typeHandlerRegistry.hasTypeHandler(parameterObject.getClass())) {
          value = parameterObject;
        } else {
          MetaObject metaObject = configuration.newMetaObject(parameterObject);
          value = metaObject.getValue(propertyName);
        }
        cacheKey.update(value);
      }
    }
    if (configuration.getEnvironment() != null) {
      // issue #176
      cacheKey.update(configuration.getEnvironment().getId());
    }
    return cacheKey;
  }

首先需要 id,这个 id 是 mapper 里方法的 id, 然后是偏移量跟返回行数,再就是 sql,然后是参数,基本上是会有影响的都加进去了,在这个 update 里面

public void update(Object object) {
    int baseHashCode = object == null ? 1 : ArrayUtil.hashCode(object);

    count++;
    checksum += baseHashCode;
    baseHashCode *= count;

    hashcode = multiplier * hashcode + baseHashCode;

    updateList.add(object);
  }

其实是一个 hash 转换,具体不纠结,就是提高特异性,然后回来就是继续调用 query

@Override
  public <E> List<E> query(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler, CacheKey key, BoundSql boundSql) throws SQLException {
    ErrorContext.instance().resource(ms.getResource()).activity("executing a query").object(ms.getId());
    if (closed) {
      throw new ExecutorException("Executor was closed.");
    }
    if (queryStack == 0 && ms.isFlushCacheRequired()) {
      clearLocalCache();
    }
    List<E> list;
    try {
      queryStack++;
      list = resultHandler == null ? (List<E>) localCache.getObject(key) : null;
      if (list != null) {
        handleLocallyCachedOutputParameters(ms, key, parameter, boundSql);
      } else {
        list = queryFromDatabase(ms, parameter, rowBounds, resultHandler, key, boundSql);
      }
    } finally {
      queryStack--;
    }
    if (queryStack == 0) {
      for (DeferredLoad deferredLoad : deferredLoads) {
        deferredLoad.load();
      }
      // issue #601
      deferredLoads.clear();
      if (configuration.getLocalCacheScope() == LocalCacheScope.STATEMENT) {
        // issue #482
        clearLocalCache();
      }
    }
    return list;
  }

可以看到是先从 localCache 里取,取不到再 queryFromDatabase,其实比较简单,这是一级缓存,考虑到 sqlsession 跟 BaseExecutor 的关系,其实是随着 sqlsession 来保证这个缓存不会出现脏数据幻读的情况,当然事务相关的后面可能再单独聊。

二级缓存

其实这个一级二级顺序有点反过来,其实查询的是先走的二级缓存,当然二级的需要配置开启,默认不开,
需要通过

<setting name="cacheEnabled" value="true"/>

来开启,然后我们的查询就会走到

public class CachingExecutor implements Executor {

  private final Executor delegate;
  private final TransactionalCacheManager tcm = new TransactionalCacheManager();

这个 Executor 中,这里我放了类里面的元素,发现没有一个 Cache 类,这就是一个特点了,往下看查询过程

@Override
  public <E> List<E> query(MappedStatement ms, Object parameterObject, RowBounds rowBounds, ResultHandler resultHandler) throws SQLException {
    BoundSql boundSql = ms.getBoundSql(parameterObject);
    CacheKey key = createCacheKey(ms, parameterObject, rowBounds, boundSql);
    return query(ms, parameterObject, rowBounds, resultHandler, key, boundSql);
  }

  @Override
  public <E> List<E> query(MappedStatement ms, Object parameterObject, RowBounds rowBounds, ResultHandler resultHandler, CacheKey key, BoundSql boundSql)
      throws SQLException {
    Cache cache = ms.getCache();
    if (cache != null) {
      flushCacheIfRequired(ms);
      if (ms.isUseCache() && resultHandler == null) {
        ensureNoOutParams(ms, boundSql);
        @SuppressWarnings("unchecked")
        List<E> list = (List<E>) tcm.getObject(cache, key);
        if (list == null) {
          list = delegate.query(ms, parameterObject, rowBounds, resultHandler, key, boundSql);
          tcm.putObject(cache, key, list); // issue #578 and #116
        }
        return list;
      }
    }
    return delegate.query(ms, parameterObject, rowBounds, resultHandler, key, boundSql);
  }

看到没,其实缓存是从 tcm 这个成员变量里取,而这个是什么呢,事务性缓存(直译下),因为这个其实是用 MappedStatement 里的 Cache 作为key 从 tcm 的 map 取出来的

public class TransactionalCacheManager {

  private final Map<Cache, TransactionalCache> transactionalCaches = new HashMap<>();

MappedStatement是被全局使用的,所以其实二级缓存是跟着 mapper 的 namespace 走的,可以被多个 CachingExecutor 获取到,就会出现线程安全问题,线程安全问题可以用SynchronizedCache来解决,就是加锁,但是对于事务中的脏读,使用了TransactionalCache来解决这个问题,

public class TransactionalCache implements Cache {

  private static final Log log = LogFactory.getLog(TransactionalCache.class);

  private final Cache delegate;
  private boolean clearOnCommit;
  private final Map<Object, Object> entriesToAddOnCommit;
  private final Set<Object> entriesMissedInCache;

在事务还没提交的时候,会把中间状态的数据放在 entriesToAddOnCommit 中,只有在提交后会放进共享缓存中,

public void commit() {
    if (clearOnCommit) {
      delegate.clear();
    }
    flushPendingEntries();
    reset();
  }