Wednesday, March 14, 2018

实习学到的 知识

UDF,HIVE
为了写UDF,需要Eclipse装插件。IntelliJ 的话 需要把setting file 改一下。这样数据仓库的路径可以到公司的maven 数据仓库。


build-build artifacts 就会在 项目目录下面的out里面看到一个打包好的jar file
然后上传到D2 ->new resource jar 最后要点击submit 就能在开发环境下面操作了




命令:正确找到有多少个分区
odpscmd -e "Show PARTITIONS search_kg.item_vectorial_aspects ;"

新建表格然后插入:
http://help.aliyun-inc.com/internaldoc/detail/27863.html?spm=a2c1f.8259796.2.66.PZtDOm
  1. insert into table srcp partition (p='abc') values ('a',1),('b',2),('c',3);




insert into table partition(ds='xxx')
select 'a','b' from dual;



odpscmd -e "jar -libjars zheng_content_free_sentiment_analysis.jar -classpath /Users/zhenggao/Documents/workspace/zheng_content_free_sentiment_analysis.jar -resources commons-math3-3.6.1.jar correlationAnalysis.PredictionCorrelationAnalysis zhenggao_normalized_item_aspect_predicted zhenggao_normalized_item_aspect zhenggao_pearson_correlation;"

odpscmd -e "add jar /Users/zhenggao/Documents/workspace/zheng_content_free_sentiment_analysis.jar -f"



odpscmd -e "set odps.graph.use.multiple.input.output=true;set odps.graph.worker.num=550;set odps.graph.worker.memory=32768; jar -libjars zheng_graph_random_walk.jar -classpath /Users/zhenggao/Documents/workspace/zheng_graph_random_walk.jar -resources commons-math3-3.6.1.jar,zhenggao_edge_type_transition_matrix UserBehaviorBasedContentGraph graph_vertex_filter_2 graph_edge_filter_2 zhenggao_edge_type_transition_matrix 2;"


SQL 语句:
read gaozheng_title_segment 1

insert overwrite table gaozheng_title_segment select item as id, alinlp_segment(regexp_replace(title,' ',''),"MAINSE"," ") as segment from tmp_filtered_item_title;

select regexp_replace('ac d',' ','') from dual



read search_kg_dev.user_aspect_item_temp partition(ds='20180603') 5;

tunnel

  1. tunnel upload log.txt test_project.test_table/p1="b1",p2="b2"
tunnel download -fd ### review_content_for_each_period/period=20180610-20180615 review_content.txt;


没有权限的话,可以在后面加上--user
比如安装 pip 或者 用pip install 其他Package的时候,可以用e.g. pip install numpy --user 就可以了

服务器上面用虚拟环境,不然不能pip3安装各种package
用python3, pip3
进入 source /home/zheng.gz/env/bin/activate
离开 deactivate

只是安装pip 见 https://pip.pypa.io/en/stable/installing/#id7
tmux 只有root 用户能用sudo yum install tmux



No comments:

Post a Comment