# TODO¶

Listed are plans/directions the project is going to do in the next stage.

## Demanding¶

- lower search granularity to sector tree.
- merge CONST and VAR tokens.
- different symbol weight: Math token > math variable > sub/sup-script.

\( \begin{equation} \left\{ \label{eq154} \begin{array}{ll} \text{Score} &= \sum_t \text{sf}_{t,d} \cdot \text{idf}_{t,d} \\ &\\ \text{sf}_{t,d} &= S_{\text{sy}} \left( (1- \theta) + \theta \frac 1 {\log(1 + \operatorname{leaves}(T_d))} \right) \\ &\\ \text{idf}_{t,d} &= \sum_{p \in \mathfrak{T}( M(t, d) )} \log \frac{N}{\text{df}_p} \end{array} \right. \end{equation} \)

- on-disk math index compression, faster indexer, index-stage init threshold.
- Re-design representation:
- eliminate the impact of sup/subscripts in some cases, e.g., definite and indefinite integrals.
- And also prime variable, e.g., x and x’.
- being able to differentiate \(\sum_{i=0}^n x_i = x\) and \(\sum_{i=0} x_i^n = x\).
- Solution: e.g., \sum lifted to operator, leaving a
`base`

to match variable, hanging there with sub/sup-scriptions.

**boolean**query language support (must, should, must-not).- Field search (index many sources and search MSE tag for example).
- [✓] put some large resources on CDN (jsdelivr.com)
- [✓] Show last update of index and some visit statistics at homepage.
- [✓] faster TeX rendering using mathjax v3.
- [✓]
**Increase cache postlist hit chance**by caching only long posting lists. - [✓] scalability: Multiple nodes on each core or different machines (using MPI)
- [✓] re-entrant posting list iterators and MNC scoring.
- [✓]
**Combined math and text search**under new model. - [✓] Operand match
**highlight**. - [✓]
**Wildcard**under new model. - [✓] Prefix model efficiency: MaxScore-like
**pruning**. - [✓] Path
**operators hashing**to distinguish operator symbols.

## Misc¶

- picture input UI interface on mobile platform, handwritten input on PC.
- QAC, spelling correction, search suggestion.
- faster Chinese tokenizer
- Return informative msg on query TeX parse error.
- indexing automation.
- Special posting list for big number exact match, e.g., “1/2016”.
- Semantics
- math equivalence awareness, e.g. 1+1/n = (1+n)/n.
**Text synonym**awareness, e.g. horse = pony.- Embedding of both text and math, e.g. pythagorean == x^2 + y^2 = z^2

## Consider additional indexing sources¶

- artofproblemsolving.com
- matheducators.stackexchange.com
- MathOverflow
- CrossValidated
- physics stackexchange
- Wolfram MathWorld
- Wikipedia (English version)
- Socratic
- NIST DLMF
- https://brilliant.org/
- PlanetMath
- Proof wiki