Iter - to be professional: February 2011

Monday, February 28, 2011

find the shortest substring covering all given characters.

A long string text, given some characters, find the shortest
substring covering all given characters.
请问大家有没有想法，谢谢

quick thoughts：

两个指针p,q，先前进q直到cover了所有characters，记录当前长度
然后单步前进q，每次都尽量前进p使得pq间仍旧cover了所有characters，
maintain最短长度

需要用一个map维护出现的字符

复杂度O(n)因为相当于scan了2次

假设str[0..n-1]表示整个字符串。基本思路是从str[0]开始，以每个位置作为substring的起点，从该位置往后直到substring能cover所有字符。那么最多就有n个起点。问题是，如何每次只用O(1)来确定substring长度。

我想可以用一个辅助数组记录每种字符出现在字符串中的位置。例如字符a出现在str[0]和str[20]两个地方。假设当前已经确定以str[0]为起点的substring长度。接下来应该确定以str[1]为起点的substring长度。该substring丢弃了str[0]，所以必须要至少在 str[20]结束，否则字符a没被cover。这就是基本思路。

我用一个例子解释整个算法。假设字符串包括0到9的数字，整个串如下
0247324596818329654001378
第一步扫描字符串，记录下每种字符位置，如下
0：0，19，20
1：11，21
2：1，5，14
3：4，13，22
4：2，6，18
5：7，17
6：9，16
7：3，23
8：10，12，24
9：8，15

第二步，以str[0]为起点，substring的终点在str[11].
接下来，以str[1]为起点，因为丢弃了str[0]，即字符0，那么从辅助数组可以找到下一个字符0出现在str[19]，因此substring结束于str[19]。
这样每次substring起点往后一个字符，用O(1)时间可以确定结束位置。

随机数讨论

rand5()是没有能力只通过一次随机数的产生，和一些四则运算
，来获得更大范围的随机数的。
（前提是，这里说的都是整数哦。。。如果是实数范围的话另当别论啦。）
所以必须通过多次rand5()才行呢

Expand a random range from 1-5 to 1-7

int i;
do
{
i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1
and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7

WHy can't I just put

i = 6*(rand5()-1);

* 1没有保证平均概率
* 2返回的数值的范围也不对呀

5*(rand5()-1)：产生0,5,10,15或者20
5 * (rand5() - 1) + rand5()就是1-25之间的随机数
如果我没有记错的话，后面那个while应该是:
if(i<21) return i%7

while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1;
产生的是2-5，就不对了

至于楼主的做法，
i = 6*(rand5()-1)，产生的是0,6，12，18或者24

面经

Amazon 面试题
grep不愧为Amazon的最爱问题。
两次电面都问OOD...不容易。

在网上申请的SDE职位。一共两轮电话面试，一轮onsite。

第一轮电话面试：
1、解释Hash Table，包括“可以用什么数据结构实现hash table”，“what is a
good hash function”，“什么是load factor”。

2、算法：删除一个给定数列中重复的元素。

* 这道题看起来简单但是还是很多陷阱啊。。。
比如删除是什么意思？只是把array element mark为一个不可能的值（比如-1），还是
输出一个崭新的没有重复的数列。

3、merge两个有序数组。要求先给他解释算法，再写代码。他当时给了我十分钟的时间
，让我写好以后发到他邮箱里。

k=i+j
a[i],b[j],c[k]

int ii=jj=kk=0;
while(kk<k){
        if(a[ii]>b[jj]){
                c[kk]=b[jj];
                jj++;
        } else {
                c[kk]=a[ii];
                ii++;
        }
        kk++;
}

4、OOD：设计一个汽车出租（Car Rental Agency）的系统。他先问我如果要实现
vehicle search，需要哪些类；然后又问要实现rent a car，又需要哪些类；最后问如
果快到了交车截至时间，需要向用户发送提醒的邮件，应该怎么做。

第二轮电话面试：
1、Favorite project。

2、你最喜欢的排序算法，它是怎么工作的，有什么优点和缺点。我说的是选择排序，
他又问有什么排序算法比它的时间复杂度小，并且同样要描述一下它们是怎么工作的。

3、算法、写代码：
给了两个数组，要求找出他们之中相同的元素，并且将相同的元素存储在一个新的数组
里，再输出。如果数组里有重复的元素，比如array1中有四个5，array2中有两个5，那
么新数组里只存储两个5。要求我写代码，再念给他听。

4、OOD：设计restaurant reservation系统。

5、好像是一个开放性问题，我没有完全理解：log file中存储了很多order的信息，每
个order都有一个oid，字符串格式（xxx-xxxxxxx-xxxxxxx）。要求从log file中把所
有的oid都提取出来，问我怎么做。

on-site：
一、给定一篇文章，用户想要搜索一个单词，要求给出搜索单词的建议（就像一般的搜
索引擎的那种功能）。要求描述算法、复杂度并写代码。

二、先问了几个行为问题。然后问了两个简单的链表问题，单链表中查找倒数第n个数
和判断链表中是否有环。编程题问的是boggle游戏的问题：给定一个4*4的矩阵，每个
位置有一个字母，可以从一个位置跳转到周围八个相邻位置中的任何一个，但不能跳到
已经访问过的位置，要求找出所有的单词（假设给定了一个词典）。
http://en.wikipedia.org/wiki/Boggle

三、午餐，一个经理问了一些简历上的projects和实习的经历，然后又介绍了一下他们
组的工作。他说这次面试我的主要是Kindle组的。

四、不是Kindle组的，我估计是传说中的bar raiser。第一个问题是给了一个很大的文
件（不能完全放入内存），其中每一行存一个整数，要求判断这个文件中的数有没有重
复。然后是一个开放性问题：一台服务器每过三天就要挂一次，需要重启才能再次使用
，每次重启需要一分钟的时间；问有什么方法能解决这个问题。

* 服务器那个题我答的不好，因为老是不知道他在问什么，他想要的答案是什么。反正他
给了我很多提示，比如可以用交替使用两台机器，还需要一个load balancer，这样可
以在一台服务器挂的时候正好用上另一台正常工作的服务器，再重启挂的这个机器。

五、实现一个整数除法的函数，不使用除号，可以使用加号、减号和乘号，函数只返回
商。
--

文件找重复的那道题我觉得可以用hashtable，也可以external sorting。hashtable有
个问题是如果内存中放不下这个哈希表的话怎么办？我说可以使用多台机器，每台存放
一定范围内的数字（类似于MapReduce中map的过程），然后再分别对每个机器处理。

上来就一个问题,设计一个网上预定飞机票的系统

讨论了50分钟左右,弄的很细,每个class里有啥variable,啥method,都要说. 比如
Controller class谁去call, 怎么用Controller class. 还有内存怎么寸(不用数据库,
全存memory里),用什么数据结构,(他提出hashtable),什么作为key,什么作为value
一开始还好,后来纠结在一个如何给用户一个指定日期的航班信息,因为我没有存日期,
最后时间快到了,他让我把想好的设计发邮件给他...

贪心算法

贪心算法（又称贪婪算法）是指，在对问题求解时，总是做出在当前看来是最好的选择。也就是说，不从整体最优上加以考虑，他所做出的仅是在某种意义上的局部最优解。贪心算法不是对所有问题都能得到整体最优解，但对范围相当广泛的许多问题他能产生整体最优解或者是整体最优解的近似解。

the workflow working on a team using mercurial

When you’re working on a team using mercurial, your workflow is going to look a lot like this:

If you haven’t done so in a while, get the latest version that everyone else is working off of:

hg pull
hg up

Make some changes
Commit them (locally)
Repeat steps 2-3 until you’ve got some nice code that you’re willing to inflict on everyone else
When you’re ready to share:

hg pull to get everyone else’s changes (if there are any)
hg merge to merge them into yours
test! to make sure the merge didn’t screw anything up
hg commit (the merge)
hg push

an idea about farm

buy or rent a woods, or a fruit tree farm, there should be a lot of glasses and bugs and insects, then we can raise a lot of chicken, those chicken will be very delicious because their are raised like wild chicken in the mountain, eating worms and insects and earthworm, not feed. so I believe the taste of their meet will be the same as the wild chicken. also we can provide a hotel and restaurant to cook those kind of wild meat. I think the business will be good. so the good idea for a good farm.

an idea about elearning

I have worked in e-learning industry for a long time. so I am thinking, why not develop a python e-learning system and run it on google platform, I can provide the chinese course, chinse history, chinese geography, chinese math to people, especially the chinese people who immigrate to foreign country, they hope their children can learn something about china. I think this is a excellent idea. I will plan to do it.

Tuesday, February 22, 2011

Using Threads in Twisted

* the benefits of twisted.thread.

* how to use the thread of twisted.

Hash - Eternally Confuzzled

A hash table, put simply, is an abstraction of an array that allows any value to be used as an index. While an array requires that indices be integers, a hash table can use a floating-point value, a string, another array, or even a structure as the index. This index is called the key, and the contents of the array element at that index is called the value. So a hash table is a data structure that stores key/value pairs and can be quickly searched by the key. Because insertion and removal are operations dependent on the speed of the search, they tend to be fast as well.
To achieve this magic, a hash table uses a helper function that converts any object into an integral index suitable for subscripting the array.

FYI: I have 1337 coins

Here is an interview question you would expect from a typical Google interview, which is an interesting problem itself. Different solutions from multiple perspectives are provided in this post.

There are n coins in a line. (Assume n is even). Two players take turns to take a coin from one of the ends of the line until there are no more coins left. The player with the larger amount of money wins.

Would you rather go first or second? Does it matter?

Assume that you go first, describe an algorithm to compute the maximum amount of money you can win.

U.S. coins in various denominations in a line. Two players take turn to pick a coin from one of the ends until no more coins are left. Whoever with the larger amount of money wins.

use hash to improve the performance

two arrays, too loops, usually the complex will be O(n*n).
but if we use a hash dict to store one array, then we can go through every item in the array filter by the hash dict directly, that will be very fast, and the complex only be O(n+n).

Iter - to be professional

Pages