Report Name
On General Binary Search SQLi Algorithm Theory
Report Author
Bedirhan Urgun, urgunb at hotmail dot com, www.webguvenligi.org
Report Date
February 2008
The Theory
What interests us is the rate of growth / order of growth of the running time.
And the running time of an algorithm on a particular input is the number of
primitive operations or "steps" executed. It is convenient to define the
notion of step so that it is as machine-independent as possible. In our case
"step" is # of requests.
In RAM (Random Access Machine) model a step is after another, in other words
there is no concurrency.
Theta(g(n)) = {f(n): for constants c0, c1,
n0; 0<=c1.g(n) <= f(n) <=
c2.g(n) for all n>=n0}
Analysis
Let db record's length be fetched be n and the function of # of requests produced by the bisection algorithm be f(n).
For a single character, # of requests can be bounded between;
1 <= {# of requests to find a single character} <= 7
The best case, which is the first try being a hit, is 1 and the worst case is 7, which is the last try being a hit or none for that matter. For a whole data (db record) of length n this becomes;
1.n <= f(n) <= 7.n,
0 <= 1.n <= f(n) <= 7.n for n >= 1, where c0 = 1, c1 = 7, n0 = 1
as such Theta(n) = f(n), which basically says the order of growth of the tools' running time (utilizing bisection algorithm) is linear in terms of the length of the data to be fecthed.
Algorithms (Pseudocode)
1.
int search(c0, c1){
p = (c1 - c0)/2 + c0
if(c0 == p)
return c0
r = query(p) // tests if (x < p) condition where x is the current character's ascii value
if(r)
return search(c0,p)
else
return search(p,c1)
}
2.
int search(c0, c1){
p = (c1 - c0)/2 + c0
r = query1(p) // tests if (x < p) condition where x is the current character's ascii value
if(r)
return search(c0,p)
else{
r = query2(p) // tests if (x > p) condition where x is the current character's ascii value
if(r)
return search(p,c1)
else
return p;
}
}
3.
...
Optimizing the Binary Search Algorithm in Blind SQL Injection
Theta(n) also shows that the algorithm used is asymptotically optimal. The complexity for even the "mappable blind sql injection", which resembles B-trees, is Theta(n).
To reduce the searching operation, a binary search tree should be height-balanced (self-balancing or AVL), which is keeping number of levels after the root element at a bare minimum. A tool (blind sql injector) has to create a balanced search tree, however, for performance reasons the tree may not strictly be height-balanced. It may be more efficient to keep a custom-balanced binary search tree. For readable ASCII domain comprises 32 to 126 decimal. If we can use uppercase function and deprioritize {, |, } and ~ characters, then we can proiritize the 65-90 (A-Z) range when creating our binary search tree.
There may of course other custom-balanced binary search trees exist in the tool for numerical heavy data and classes alike. Moreover, the tool can switch between custom-balanced binary search trees according to miss rates increasing for the selected binary search tree in use.
How do tools understand the Enf Of Input?
1.1 extra character requests (sqlmap)
1.2 checking against NULL (bsql)
1.3 calculating LENGTH and acting accordingly
(sqlix, absinthe)
References:
[1] Introduction to Algorithms; Cormen, Leiserson and Rivest
[2] http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm