Wednesday, March 14, 2018

实习学到的 知识

UDF,HIVE
为了写UDF,需要Eclipse装插件。IntelliJ 的话 需要把setting file 改一下。这样数据仓库的路径可以到公司的maven 数据仓库。


build-build artifacts 就会在 项目目录下面的out里面看到一个打包好的jar file
然后上传到D2 ->new resource jar 最后要点击submit 就能在开发环境下面操作了




命令:正确找到有多少个分区
odpscmd -e "Show PARTITIONS search_kg.item_vectorial_aspects ;"

新建表格然后插入:
http://help.aliyun-inc.com/internaldoc/detail/27863.html?spm=a2c1f.8259796.2.66.PZtDOm
  1. insert into table srcp partition (p='abc') values ('a',1),('b',2),('c',3);




insert into table partition(ds='xxx')
select 'a','b' from dual;



odpscmd -e "jar -libjars zheng_content_free_sentiment_analysis.jar -classpath /Users/zhenggao/Documents/workspace/zheng_content_free_sentiment_analysis.jar -resources commons-math3-3.6.1.jar correlationAnalysis.PredictionCorrelationAnalysis zhenggao_normalized_item_aspect_predicted zhenggao_normalized_item_aspect zhenggao_pearson_correlation;"

odpscmd -e "add jar /Users/zhenggao/Documents/workspace/zheng_content_free_sentiment_analysis.jar -f"



odpscmd -e "set odps.graph.use.multiple.input.output=true;set odps.graph.worker.num=550;set odps.graph.worker.memory=32768; jar -libjars zheng_graph_random_walk.jar -classpath /Users/zhenggao/Documents/workspace/zheng_graph_random_walk.jar -resources commons-math3-3.6.1.jar,zhenggao_edge_type_transition_matrix UserBehaviorBasedContentGraph graph_vertex_filter_2 graph_edge_filter_2 zhenggao_edge_type_transition_matrix 2;"


SQL 语句:
read gaozheng_title_segment 1

insert overwrite table gaozheng_title_segment select item as id, alinlp_segment(regexp_replace(title,' ',''),"MAINSE"," ") as segment from tmp_filtered_item_title;

select regexp_replace('ac d',' ','') from dual



read search_kg_dev.user_aspect_item_temp partition(ds='20180603') 5;

tunnel

  1. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
tunnel download -fd ### review_content_for_each_period/period=20180610-20180615 review_content.txt;


没有权限的话,可以在后面加上--user
比如安装 pip 或者 用pip install 其他Package的时候,可以用e.g. pip install numpy --user 就可以了

服务器上面用虚拟环境,不然不能pip3安装各种package
用python3, pip3
进入 source /home/zheng.gz/env/bin/activate
离开 deactivate

只是安装pip 见 https://pip.pypa.io/en/stable/installing/#id7
tmux 只有root 用户能用sudo yum install tmux



Sunday, March 11, 2018

Friday, March 9, 2018

pytorch


此外有些操作会导致tensor不连续,这时需调用tensor.contiguous方法将它们变成连续的数据,该方法会使数据复制一份,不再与原来的数据共享storage。 

e.is_contiguous()


https://github.com/chenyuntc/pytorch-book/blob/master/chapter3-Tensor%E5%92%8Cautograd/Tensor.ipynb 里面最后linear regression 实战的部分 db = dy.sum() 可以理解为 y = wx + b*Tensor((1,1,1,1)). 所以在求矩阵导数的时候,db = Tensor(1,1,1,1)的逆矩阵*dy 也就是求dy.sum()

用autograd的时候如何保留非叶子节点的grad,具体有两种,见https://github.com/chenyuntc/pytorch-book/blob/master/chapter3-Tensor%E5%92%8Cautograd/Autograd.ipynb
这里面还介绍了 自己怎么写一个function 函数来自定义Backward 反向传播