Listed are plans/directions the project is going to do in the next stage.


  • merge CONST and VAR tokens.
  • different symbol weight: Math token > math variable > sub/sup-script. (IDF) struct TF-IDF $$ \sum_q \max_{m \in T(q), n \in D} \sum_{t \in \mathfrak{T}(m)} \big( \min(|t|, \text{TF}_{t,D}) \times \text{IDF}_t \big) $$
  • on-disk math index compression, faster indexer, index-stage init threshold.
  • Re-design representation:
    • eliminate the impact of sup/subscripts in some cases, e.g., definite and indefinite integrals.
    • And also prime variable, e.g., x and x’.
    • being able to differentiate \(\sum_{i=0}^n x_i = x\) and \(\sum_{i=0} x_i^n = x\).
    • Solution: e.g., \sum lifted to operator, leaving a base to match variable, hanging there with sub/sup-scriptions.
  • boolean query language support (must, should, must-not).
  • Field search (index many sources and search MSE tag for example).
  • [✓] put some large resources on CDN (
  • [✓] Show last update of index and some visit statistics at homepage.
  • [✓] faster TeX rendering using mathjax v3.
  • [✓] Increase cache postlist hit chance by caching only long posting lists.
  • [✓] scalability: Multiple nodes on each core or different machines (using MPI)
  • [✓] re-entrant posting list iterators and MNC scoring.
  • [✓] Combined math and text search under new model.
  • [✓] Operand match highlight.
  • [✓] Wildcard under new model.
  • [✓] Prefix model efficiency: MaxScore-like pruning.
  • [✓] Path operators hashing to distinguish operator symbols.


  • picture input UI interface on mobile platform, handwritten input on PC.
  • QAC, spelling correction, search suggestion.
  • faster Chinese tokenizer
  • Return informative msg on query TeX parse error.
  • indexing automation.
  • Special posting list for big number exact match, e.g., “1/2016”.
  • Semantics
    • math equivalence awareness, e.g. 1+1/n = (1+n)/n.
    • Text synonym awareness, e.g. horse = pony.
    • Embedding of both text and math, e.g. pythagorean == x^2 + y^2 = z^2

Consider additional indexing sources

  • MathOverflow
  • CrossValidated
  • physics stackexchange
  • Wolfram MathWorld
  • Wikipedia (English version)
  • Socratic
  • PlanetMath
  • Proof wiki