14 Nov 2018

ARTS-Week2

ARTS weekly

[ ARTS   DynamicPlanning   trie   Latex   ]

Algorithm

Problem

Leetcode 3. Longest Substring Without Repeating Characters

Given a string, find the length of the longest substring without repeating characters.

Example 1:

Input: “abcabcbb”

Output: 3

Explanation: The answer is “abc”, with the length of 3.

Later in this post, we call one string without repeating characters is a non-repeating string.

Thought

Brute-force method

The direction is to check all the possible substring if it’s a valid non-repeating string. The inner most operation which checks if a substring is the one that has none repeating character, is of time complexity O(n), so the total time complexity will be O(n3), where n indicates the length of input string;

for head_index in 0 to end
	for trail_index in end to head_index+1
		if substring(head_index, trail_index) is valid non-repeating string
			update max_substring_length
		end if
	end for
end for

Dynamic planning

In the former method, checking if one substring is a non-repeating string is of O(n) time complexity. If one substring s contains repeating character, thus any substring contains s in it is absolutely not non-repeating string neither. So we can use this relationship to optimize former method, and it’s based on dynamic planning theory.

Given input string s, we can convert this problem to one that calculates the length of longest substring ends with the character of string s.

For simplicity, we denote C(i) to indicate the length of the longest non-repeating substring ends with character s[i], and b[i] the start index of this substring.

After we have iterated all characters in s, the largest one in C(i) will correspond to the longest non-repeating substring.

Because we donot need to know the exact start and end position of this substring, but only the max length, so we only need two variable, one holds the current max and b[i] for previous index i.

max_total <- 0
b_pre <- 0

for i in 0 to end
	if substring(b_pre, i-1) contains s[i]
		b_pre <- first occurency pos
	else
		max_total <- max(max_total, i-b_pre+1)
	endif
endfor      

The main improvement of this method is that we could use O(logn) time complexity to check if one string contains a character, and the total complexity is O(nlogn).

Best implementation

In the previous section, we use substring.find method to check if one substring contains a character. It can also be omitted using a auxiliary key-value map.

We use an map visited to hold the latest index of new character c. For each character c in input string, if visited doesn’t contain key c, then the local longest substring could append c to become even longer. Otherwise, we need to check if the last visted index is at the range of the local longest substring. If not, appending c to local longest substring doesnot modify the non-repeating attribute.

Let’s demonstrate it with code:

max_local <- 0
max_total <- 0
visited <- empty map

for i in 0 to end
	if not visited(s[i]) or i - visited(s[i]) > max_local
		max_local <- max_local + 1
	else
		max_total <- max(max_total, max_local)		
		max_local <- i - visited(s[i])
	endif

	visited(s[i]) <- i
endfor    

max_total <- max(max_total, max_local)

This implementation only need one loop to iterate all characters of input string, so the total time complexity is O(n), but with O(n) space complexity.

Review

The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases

The author of this article shared to us a paper about an optimized trie structure.

We extract the definition of traditional trie in wiki here:

In computer science, a trie, also called digital tree, radix tree or prefix tree is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.

It has an appealing property that is:

  • the number of nodes you have to visit to search for your key depends on the length of the key and not on the number of nodes in the trie!
  • better search efficiency over B-tree, or hash table.

But this brings in an apparent disadvantage:

  • heavy memory cost that vanilla radix tries can incur, because each node contains a pointer for every possible value of the ‘next’ character in the key

So the author introduces ART(Adaptive Radix Tree) which decreases the memory requirements through cutting down the fan-out of each node adaptively.

The paper distinguishes four different cases, for nodes with up to 4, 16, 48 and 256 children respectively. It introduces the ability to change the datastructure used for each internal node depending on how many children the node actually has, rather than how many it might have. Thus, the total memory cost can decrease according to actual usage.

To store the value corresponding to the key, we have three methods, and each have cons/pros:

  • Single value nodes are leaf nodes which store exactly one value. They’d be represented in our scheme as a different Node type. However, because of the increased tree height, it causes one additional pointer traversal per lookup.
  • Multi-value leaves are just like the regular Node[4|16|48|256] types, but the child_ptrs now become an array of Value*. This only applies to situations where all keys in a tree have the same length.
  • Pointer/value slots store values directly in the slots otherwise used for child pointers, and distinguishes between them based on the highest bit - let’s say 0 to interpret the pointer as a child node pointer, and 1 to interpret it as a pointer to a value.

The paper also mentioned many useful tricks when it comes to adopt ART to your application, for further information about the paper, please refer to ART.

Tip

Sometimes when you need to edit complex math equations, the first choice maybe something like MS-Word, or Pages. I rarely write ariticles with these tools, because they are too heavy to me, and I use Markdown editor instead.

Simple is the best.

And today, I will recommend to you an online math equation editor, which is based on Latex synatx, Click here: Latex online editor. It supports markdown smoothly, and you can also save as png which can be embedded to your file.

If you are a Latex expert, you can input math equations after Latex synatx, or you can use GUI tools to help you. So easy is it.

Take Mass–energy equivalence for example. After you have finished input, you can paste the html code directly to your Markdown note.

<a href="https://www.codecogs.com/eqnedit.php?latex=E&space;=&space;m*c^2" target="_blank"><img src="https://latex.codecogs.com/gif.latex?E&space;=&space;m*c^2" title="E = m*c^2" /></a>

And the effect is as follows:

May it help!

Share

Technology changes a lot, we always have to work hard to keep up with it. You may find it hard to learn new frameworks or skills, but you have to.

This article Achieving impossible coding tasks without knowing how to do them shows us with great positive energy, and I think you may benefit from it.

The author is once a front-end programmer, journalist writing news articles about clean energy technology, and some other roles. He always paint himself to a corner and behaves well at the end. Take his one experience for example. At the first time when he decided to work as a journalist, he had little knowledge about journalism, and had no idea about where to start from, but he learned by doing, eventually writing a couple thousand news articles. He did it anyway, and I think it’s amazing.

The author writes in this article: Constantly heading into the unknown can be nerve-wracking, and more importantly trigger fears that would prevent us from accomplishing our goals. This pattern isn’t limited to writing software, it can crop up anywhere in our life. What matters is our skill in navigating through the unknown to make it through to the other side.

So, make a target, and just do it!