diff --git a/.gitignore b/.gitignore index 0fa412e..56b5210 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,3 @@ -*.csv output/*.csv .DS_Store .idea diff --git a/README.md b/README.md index 510251e..785250a 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,9 @@ # ByteCodeDL +
+ ByteCodeDL +
+ A declarative static analysis tool for jvm bytecode based Datalog like CodeQL ## Why ByteCodeDL @@ -24,21 +28,43 @@ you can use the docker we builded like docker-compose.yml ## Features - [x] 搜索功能 -- [ ] 调用图分析 +- [x] 调用图分析 - [x] CHA - - [ ] RTA + - [x] SIMPLE-CHA + - [x] RTA - [ ] 指针分析 - [x] 上下文无关指针分析 - - [ ] 一阶上下文调用点敏感指针分析 - - [ ] 一阶上下文对象敏感指针分析 - - [ ] 一阶上下文类型敏感指针分析 -- [ ] 污点分析 + - [x] 一阶上下文调用点敏感指针分析 + - [x] 一阶上下文对象敏感指针分析 + - [x] 一阶上下文类型敏感指针分析 + - [ ] 可选择上下文敏感指针分析 +- [x] 污点分析 - [x] 上下文无关ptaint + - [x] 上下文敏感ptaint +- [ ] 输出sarif +- [ ] 实现JackEE +- [ ] 性能优化 ## Usage 见docs文件夹 +## Support + +在使用中遇到什么问题,可以通过 + +- email: help@bytecodedl.com +- github: issue/discussion +- telegram: [bytecodedl](https://t.me/bytecodedl) + +三种途径向我们反馈 + +## Plugin + +- IDEA + - ByteCodeDL helper [BDLH](https://github.com/BytecodeDL/BDLH) + - Datalog language plugin [intellij-datalog](https://github.com/BytecodeDL/intellij-datalog) + ## Acknowledgement - 感谢南大的李樾和谭添两位老师,通过他们开设的[程序分析课程](https://pascal-group.bitbucket.io/teaching.html)入门了静态分析这一领域。 diff --git a/bdl-logo.png b/bdl-logo.png new file mode 100644 index 0000000..3274319 Binary files /dev/null and b/bdl-logo.png differ diff --git a/docker-compose.yml b/docker-compose.yml index 0f77ab8..e4435eb 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,18 +1,16 @@ version: '2.4' services: bytecodedl: - image: wuxxxxx/bytecodedl:1.0.0 + image: wuxxxxx/bytecodedl:1.0.2 restart: always command: sleep infinity volumes: - ./:/bytecodedl neo: - image: neo4j:4.4.4-community + image: wuxxxxx/neo4j-server:5.12.0-bytecodedl-pathfinder-1.0.1 restart: always ports: - "0.0.0.0:7474:7474" - "0.0.0.0:7687:7687" - environment: - - NEO4J_AUTH=neo4j/bytecodedl volumes: - - ./:/bytecodedl \ No newline at end of file + - ./:/bytecodedl diff --git a/docker/bytecodedl/Dockerfile b/docker/bytecodedl/Dockerfile new file mode 100644 index 0000000..b5e456a --- /dev/null +++ b/docker/bytecodedl/Dockerfile @@ -0,0 +1,16 @@ +From ubuntu:22.04 + +LABEL version="1.0.2" +LABEL maintainer="yxxx " + +RUN apt-get update \ + && apt-get install -y vim \ + && apt-get install -y wget \ + && apt-get install -y git \ + && apt-get install -y openjdk-8-jdk + +RUN wget https://souffle-lang.github.io/ppa/souffle-key.public -O /usr/share/keyrings/souffle-archive-keyring.gpg \ + && echo "deb [signed-by=/usr/share/keyrings/souffle-archive-keyring.gpg] https://souffle-lang.github.io/ppa/ubuntu/ stable main" | tee /etc/apt/sources.list.d/souffle.list \ + && apt update && apt install -y souffle + +RUN wget https://github.com/BytecodeDL/soot-fact-generator/releases/download/v1.4.2/soot-fact-generator-1.4.2.jar \ No newline at end of file diff --git a/docker/neo4j-server/Dockerfile b/docker/neo4j-server/Dockerfile new file mode 100644 index 0000000..303c642 --- /dev/null +++ b/docker/neo4j-server/Dockerfile @@ -0,0 +1,9 @@ +From neo4j:5.12.0 + +LABEL version="1.0.1" +LABEL maintainer="yxxx " + +RUN wget https://github.com/BytecodeDL/bytecodedl-pathfinder-neo4j-procedure/releases/download/v1.0.1/bytecodedl-pathfinder-1.0.1.jar -O /var/lib/neo4j/plugins/bytecodedl-pathfinder-1.0.1.jar + +ENV NEO4J_AUTH=neo4j/bytecodedl \ + NEO4J_dbms_security_procedures_unrestricted=bytecodedl.* \ No newline at end of file diff --git a/docs/callgraph.md b/docs/callgraph.md index 38f7d74..5183f9b 100644 --- a/docs/callgraph.md +++ b/docs/callgraph.md @@ -140,7 +140,7 @@ CHA只会存在误报,但是RTA既可能存在误报也可能存在漏报。 1. 执行下面命令将结果输出到output文件夹 ```bash - souffle -I ~/code/ByteCodeDL/logic -F factsdir -D ~/code/ByteCodeDL/output ~/code/ByteCodeDL/example/cha-example-1.dl + souffle -F factsdir -D ~/code/ByteCodeDL/output ~/code/ByteCodeDL/example/cha-example-1.dl ``` 2. 执行bash importOutput2Neo4j.sh neoImportCall.sh dbname diff --git a/docs/context-insensitive-points-to.md b/docs/context-insensitive-points-to.md index 848eec1..5aa8ecf 100644 --- a/docs/context-insensitive-points-to.md +++ b/docs/context-insensitive-points-to.md @@ -263,7 +263,7 @@ cipt.Reachable(method) :- 然后执行 ```bash -souffle -I ByteCodeDL/logic -F factsdir -D ByteCodeDL/output ByteCodeDL/example/pt-noctx-example-1.dl +souffle -F factsdir -D ByteCodeDL/output ByteCodeDL/example/pt-noctx-example-1.dl ``` 分析结果保存在 output/cipt.VarPointsTo diff --git a/docs/ptaint.md b/docs/ptaint.md index 469ab5a..654dcb9 100644 --- a/docs/ptaint.md +++ b/docs/ptaint.md @@ -375,7 +375,7 @@ java8 -jar ~/code/soot-fact-generator/build/libs/soot-fact-generator.jar -i Benc // 切换目录 cd tainttest // 执行souffle -souffle -I ~/code/ByteCodeDL/logic -F . -D output ~/code/ByteCodeDL/example/ptaint-example-1.dl +souffle -F . -D output ~/code/ByteCodeDL/example/ptaint-example-1.dl ``` 然后在ouput目录能够看到`grep "Demo3" TaintVar.csv`结果 diff --git a/docs/readme.md b/docs/readme.md index ea6a55e..9f66840 100644 --- a/docs/readme.md +++ b/docs/readme.md @@ -10,3 +10,7 @@ 3. [utils.md](utils.md) 4. [query.md](query.md) 5. [callgraph.md](callgraph.md) +6. [cha-optimization.md](cha-optimization.md) +7. [cha-in-ctf.md](cha-in-ctf.md) +8. [context-insensitive-points-to.md](context-insensitive-points-to.md) +9. [ptaint](ptaint.md) diff --git a/docs/utils.md b/docs/utils.md index b78025d..3162464 100644 --- a/docs/utils.md +++ b/docs/utils.md @@ -1,6 +1,6 @@ # utils -## Class Hierachy +## Class Hierarchy 需要构建一个类型层次图,用于寻找某个类的子类、父类,或者用于判断两个类之间是否有继承关系。 diff --git a/example/cha-example-1.dl b/example/cha-example-1.dl index 415df41..32beeb2 100644 --- a/example/cha-example-1.dl +++ b/example/cha-example-1.dl @@ -1,8 +1,6 @@ #define MAXSTEP 8 -#include "inputDeclaration.dl" -#include "utils.dl" -#include "cha.dl" +#include "../logic/cha.dl" // init entrypoint diff --git a/example/cha-log4shell.dl b/example/cha-log4shell.dl new file mode 100644 index 0000000..4b9f606 --- /dev/null +++ b/example/cha-log4shell.dl @@ -0,0 +1,18 @@ +#define MAXSTEP 33 +#define CHAO 1 + +#include "../logic/cha.dl" + +BanCaller(method) :- + MethodInfo(method, _, _, class, _, _, _), + !contains("org.apache.logging.log4j", class). + + +SinkDesc("lookup", "javax.naming.Context"). + +// init entrypoint +EntryPoint(simplename, descriptor, class) :- + MethodInfo(_, simplename, _, class, _, descriptor, _), + simplename = "error", + class = "org.apache.logging.log4j.spi.AbstractLogger", + descriptor = "(Ljava/lang/String;)V". \ No newline at end of file diff --git a/example/cs-ptaint-example-1.dl b/example/cs-ptaint-example-1.dl new file mode 100644 index 0000000..d1614c0 --- /dev/null +++ b/example/cs-ptaint-example-1.dl @@ -0,0 +1,39 @@ +#define MAXSTEP 8 +//#include "../logic/one-callsite-sensitive-pt.dl" +#include "../logic/cs-ptaint.dl" +#include "../logic/one-object-sensitive-pt.dl" + +//.comp MyCallsitePtaint : CSPTaint, OneCallsiteSensitivePT{ +//} +//.init csptaint = MyCallsitePtaint +.comp MyObjectPtaint : CSPTaint, OneObjectSensitivePT{ +} + + +.init csptaint = MyObjectPtaint + + +csptaint.Reachable("", "initCtx", 0). + +csptaint.SourceMethod(""). +csptaint.SinkMethod("", 0). + +csptaint.BaseToRetTransfer(""). +csptaint.BaseToRetTransfer(""). + +csptaint.ArgToRetTransfer("", 0). + +csptaint.SanitizeMethod(""). + + +.decl TaintVar(var:Var, vCtx:Context) + +TaintVar(var, vCtx) :- + csptaint.VarPointsTo(heap, _, var, vCtx), + csptaint.TaintHeap(_, heap). + +.output TaintVar + +.output csptaint.TaintHeap +.output csptaint.TransferTaint +.output csptaint.VarPointsTo \ No newline at end of file diff --git a/example/ctf-buggyLoader.dl b/example/ctf-buggyLoader.dl index f6052d9..266911a 100644 --- a/example/ctf-buggyLoader.dl +++ b/example/ctf-buggyLoader.dl @@ -1,9 +1,7 @@ #define MAXSTEP 5 #define CHAO 2 -#include "inputDeclaration.dl" -#include "utils.dl" -#include "cha.dl" +#include "../logic/cha.dl" .decl NonParamPublicMethod(method:Method, class:Class) diff --git a/example/ctf-ezchain.dl b/example/ctf-ezchain.dl index 59355fa..7a5be3d 100644 --- a/example/ctf-ezchain.dl +++ b/example/ctf-ezchain.dl @@ -1,9 +1,7 @@ #define MAXSTEP 5 #define CHAO 2 -#include "inputDeclaration.dl" -#include "utils.dl" -#include "cha.dl" +#include "../logic/cha.dl" .decl NonParamPublicMethod(method:Method, class:Class) diff --git a/example/one-callsite-sensitive-pt-example-1.dl b/example/one-callsite-sensitive-pt-example-1.dl new file mode 100644 index 0000000..d640850 --- /dev/null +++ b/example/one-callsite-sensitive-pt-example-1.dl @@ -0,0 +1,10 @@ +#define MAXSTEP 8 +#include "../logic/one-callsite-sensitive-pt.dl" + +.init callsitecsDemo1 = OneCallsiteSensitivePT +callsitecsDemo1.Reachable("", "initCtx", 0). +.output callsitecsDemo1.VarPointsTo + +.init callsitecsDemo2 = OneCallsiteSensitivePT +callsitecsDemo2.Reachable("", "initCtx", 0). +.output callsitecsDemo2.VarPointsTo \ No newline at end of file diff --git a/example/one-object-sensitive-pt-example-1.dl b/example/one-object-sensitive-pt-example-1.dl new file mode 100644 index 0000000..69f37d1 --- /dev/null +++ b/example/one-object-sensitive-pt-example-1.dl @@ -0,0 +1,10 @@ +#define MAXSTEP 8 +#include "../logic/one-object-sensitive-pt.dl" + +.init objectDemo1 = OneObjectSensitivePT +objectDemo1.Reachable("", "initCtx", 0). +.output objectDemo1.VarPointsTo + +.init objectDemo2 = OneObjectSensitivePT +objectDemo2.Reachable("", "initCtx", 0). +.output objectDemo2.VarPointsTo \ No newline at end of file diff --git a/example/one-type-sensitive-pt-example-1.dl b/example/one-type-sensitive-pt-example-1.dl new file mode 100644 index 0000000..741891e --- /dev/null +++ b/example/one-type-sensitive-pt-example-1.dl @@ -0,0 +1,10 @@ +#define MAXSTEP 8 +#include "../logic/one-type-sensitive-pt.dl" + +.init typeDemo1 = OneTypeSensitivePT +typeDemo1.Reachable("", "initCtx", 0). +.output typeDemo1.VarPointsTo + +.init typeDemo2 = OneTypeSensitivePT +typeDemo2.Reachable("", "initCtx", 0). +.output typeDemo2.VarPointsTo \ No newline at end of file diff --git a/example/pt-noctx-example-1.dl b/example/pt-noctx-example-1.dl index 665b49f..48ce998 100644 --- a/example/pt-noctx-example-1.dl +++ b/example/pt-noctx-example-1.dl @@ -1,6 +1,4 @@ -#include "inputDeclaration.dl" -#include "utils.dl" -#include "pt-noctx.dl" +#include "../logic/pt-noctx.dl" .init cipt = ContextInsensitivePt diff --git a/example/ptaint-example-1.dl b/example/ptaint-example-1.dl index 4ca9a2a..8252655 100644 --- a/example/ptaint-example-1.dl +++ b/example/ptaint-example-1.dl @@ -1,10 +1,8 @@ -#include "inputDeclaration.dl" -#include "utils.dl" -#include "ptaint.dl" +#include "../logic/ptaint.dl" .init ptaint = PTaint -ptaint.cipt.Reachable(""). +ptaint.Reachable(""). ptaint.SourceMethod(""). ptaint.SinkMethod("", 0). @@ -20,11 +18,11 @@ ptaint.SanitizeMethod(""). -ptaint.cipt.Reachable(method) :- +ptaint.Reachable(method) :- EntryMethod(method). NormalHeap(heap, class), -ptaint.cipt.VarPointsTo(heap, this) :- +ptaint.VarPointsTo(heap, this) :- ThisVar(method, this), EntryMethod(method), VarType(this, class), @@ -21,8 +19,8 @@ ptaint.cipt.VarPointsTo(heap, this) :- NormalHeap(heap, class), ptaint.TaintHeap(insn, taintHeap), -ptaint.cipt.VarPointsTo(heap, param), -ptaint.cipt.VarPointsTo(taintHeap, param) :- +ptaint.VarPointsTo(heap, param), +ptaint.VarPointsTo(taintHeap, param) :- EntryMethod(method), FormalParam(_, method, param), VarType(param, class), @@ -45,11 +43,11 @@ ptaint.SanitizeMethod("{ + .decl VarPointsTo(heap:Heap, hctx:HContext, var:Var, vCtx:Context) + .decl InstanceFieldPointsTo(heap:Heap, hctx:Context, baseHeap:Heap, bhCtx:Context, field:Field) + .decl StaticFieldPointsTo(heap:Heap, hctx:Context, field:Field) + .decl ArrayIndexPointsTo(heap:Heap, hctx:Context, baseHeap:Heap, bhCtx:Context) + .decl Reachable(method:Method, ctx:Context, n:number) + .decl CallGraph(insn:Insn, caller:Method, callerCtx:Context, callee:Method, calleeCtx:Context) + + .decl SelectInvocationContext(callerCtx:Context, invocation:Insn, baseHeap:Heap, hctx:HContext, calleeCtx:Context) overridable + .decl SelectStaticInvocationContext(callerCtx:Context, invocation:Insn, calleeCtx:Context) overridable + + SelectStaticInvocationContext(callerCtx, insn, calleeCtx) :- + Reachable(caller, callerCtx, _), + StaticMethodInvocation(insn, _, _, caller), + calleeCtx = callerCtx. + + // new + VarPointsTo(heap, ctx, var, ctx) :- + Reachable(method, ctx, _), + AssignHeapAllocation(_, _, heap, var, method, _). + + // assign + VarPointsTo(heap, hctx, to, ctx) :- + Reachable(method, ctx, _), + VarPointsTo(heap, hctx, from, ctx), + AssignLocal(_, _, from, to, method). + + // cast + VarPointsTo(heap, hctx, to, ctx) :- + Reachable(method, ctx, _), + AssignCast(_, _, from, to, _, method), + VarPointsTo(heap, hctx, from, ctx). + + // load field + VarPointsTo(heap, hctx, to, ctx) :- + Reachable(method, ctx, _), + LoadInstanceField(_, _, to, base, field, method), + VarPointsTo(baseHeap, bhCtx, base, ctx), + InstanceFieldPointsTo(heap, hctx, baseHeap, bhCtx, field). + + // store field + InstanceFieldPointsTo(heap, hctx, baseHeap, bhCtx, field) :- + Reachable(method, ctx, _), + StoreInstanceField(_, _, from, base, field, method), + VarPointsTo(heap, hctx, from, ctx), + VarPointsTo(baseHeap, bhCtx, base, ctx). + + // load staic field + VarPointsTo(heap, hctx, to, ctx) :- + Reachable(method, ctx, _), + LoadStaticField(_, _, to, field, method), + StaticFieldPointsTo(heap, hctx, field). + + // store static field + StaticFieldPointsTo(heap, hctx, field) :- + Reachable(method, ctx, _), + StoreStaticField(_, _, from, field, method), + VarPointsTo(heap, hctx, from, ctx). + + // load from array + VarPointsTo(heap, hctx, to, ctx) :- + Reachable(method, ctx, _), + LoadArrayIndex(_, _, to, base, method), + VarPointsTo(baseHeap, bhCtx, base, ctx), + ArrayIndexPointsTo(heap, hctx, baseHeap, bhCtx). + + // store into array + ArrayIndexPointsTo(heap, hctx, baseHeap, bhCtx) :- + Reachable(method, ctx, _), + StoreArrayIndex(_, _, from, base, method), + VarPointsTo(heap, hctx, from, ctx), + VarPointsTo(baseHeap, bhCtx, base, ctx). + + Reachable(callee, calleeCtx, n+1), + CallGraph(insn, caller, callerCtx, callee, calleeCtx) :- + Reachable(caller, callerCtx, n), + n < MAXSTEP, + SpecialMethodInvocation(insn, _, callee, base, caller), + VarPointsTo(baseHeap, hctx, base, calleeCtx), + SelectInvocationContext(callerCtx, insn, baseHeap, hctx, callerCtx). + + Reachable(callee, calleeCtx, n+1), + CallGraph(insn, caller, callerCtx, callee, calleeCtx) :- + Reachable(caller, callerCtx, n), + n < MAXSTEP, + StaticMethodInvocation(insn, _, callee, caller), + SelectStaticInvocationContext(callerCtx, insn, calleeCtx). + + Reachable(callee, calleeCtx, n+1), + CallGraph(insn, caller, callerCtx, callee, calleeCtx) :- + Reachable(caller, callerCtx, n), + n < MAXSTEP, + VirtualMethodInvocation(insn, _, method, base, caller), + VarPointsTo(baseHeap, hctx, base, callerCtx), + NormalHeap(baseHeap, class), + MethodInfo(method, simplename, _, _, _, descriptor, _), + Dispatch(simplename, descriptor, class, callee), + SelectInvocationContext(callerCtx, insn, baseHeap, hctx, calleeCtx). + + // param + VarPointsTo(heap, hctx, param, calleeCtx) :- + CallGraph(insn, _, callerCtx, callee, calleeCtx), + ActualParam(n, insn, arg), + FormalParam(n, callee, param), + VarPointsTo(heap, hctx, arg, callerCtx), + NormalHeap(heap, _). + + // return + VarPointsTo(heap, hctx, return, callerCtx) :- + CallGraph(insn, _, callerCtx, callee, calleeCtx), + Return(_, _, var, callee), + AssignReturnValue(insn, return), + VarPointsTo(heap, hctx, var, calleeCtx). + + // this + VarPointsTo(heap, hctx, this, calleeCtx) :- + CallGraph(insn, _, callerCtx, callee, calleeCtx), + ( + VirtualMethodInvocation(insn, _, _, base, _); + SpecialMethodInvocation(insn, _, _, base, _) + ), + ThisVar(callee, this), + VarPointsTo(heap, hctx, base, callerCtx). +} \ No newline at end of file diff --git a/logic/cha.dl b/logic/cha.dl index 1e592fe..d290216 100644 --- a/logic/cha.dl +++ b/logic/cha.dl @@ -1,3 +1,5 @@ +#pragma once +#include "utils.dl" .decl EntryPoint(simplename:symbol, descriptor:symbol, class:Class) .decl Reachable(method:Method, step:number) @@ -12,7 +14,7 @@ SinkMethod(method) :- SinkDesc(simplename, class), SubEqClass(subeqclass, class), - !ClassModifier("abstract", subeqclass), + !MethodModifier("abstract", method), MethodInfo(method, simplename, _, subeqclass, _, _, _). EntryMethod(method), @@ -91,7 +93,7 @@ ShortestPathToSink(callee, sink, n-1) :- .decl RefinedReachable(method:Method) -#if defined(CHAO) +#ifdef CHAO #if CHAO == 1 RefinedReachable(method) :- SinkReachable(method, _, _). @@ -121,10 +123,10 @@ CallNode(node, "entry") :- RefinedReachable(node), EntryMethod(node). -.decl CallEdge(caller:Method, callee:Method) +.decl CallEdge(caller:Method, insn:Insn, callee:Method) .output CallEdge -CallEdge(caller, callee) :- +CallEdge(caller, insn, callee) :- RefinedReachable(caller), RefinedReachable(callee), - CallGraph(_, caller, callee). + CallGraph(insn, caller, callee). diff --git a/logic/cs-ptaint.dl b/logic/cs-ptaint.dl new file mode 100644 index 0000000..524394c --- /dev/null +++ b/logic/cs-ptaint.dl @@ -0,0 +1,64 @@ +#pragma once +#include "utils.dl" +#include "abstract-context-sensitive-pt.dl" + +.comp CSPTaint{ + //.init cspt = AbstractContextSensitivePT + //.init cspt = CSPT + + .decl TaintHeap(insn:Insn, heap:Heap) + .decl SourceMethod(method:Method) + .decl SinkMethod(method:Method, n:number) + + .decl SanitizeMethod(method:Method) + + .decl BaseToRetTransfer(method:Method) + .decl ArgToRetTransfer(method:Method, n:number) + .decl IsTaintedFrom(insn:Insn, from:Var, fromCtx:Context, to:Var, toCtx:Context) + .decl TransferTaint(heap:Heap, newHeap:Heap) + + + // taint arg to param + VarPointsTo(heap, hctx, param, calleeCtx) :- + CallGraph(insn, _, callerCtx, callee, calleeCtx), + ActualParam(n, insn, arg), + FormalParam(n, callee, param), + VarPointsTo(heap, hctx, arg, callerCtx), + TaintHeap(_, heap), // sensitive? + !SanitizeMethod(callee). + + + TaintHeap(insn, heap), + VarPointsTo(heap, hctx, to, callerCtx):- + SourceMethod(callee), + CallGraph(insn, _, callerCtx, callee, _), + AssignReturnValue(insn, to), + heap = cat("NewTainted::", insn), + hctx = "Mock::". + + IsTaintedFrom(insn, base, callerCtx, ret, callerCtx) :- + CallGraph(insn, _, callerCtx, callee, _), + BaseToRetTransfer(callee), + ( + VirtualMethodInvocation(insn, _, _, base, _); + SpecialMethodInvocation(insn, _, _, base, _) + ), + AssignReturnValue(insn, ret). + + IsTaintedFrom(insn, arg, callerCtx, ret, callerCtx) :- + CallGraph(insn, _, callerCtx, callee, _), + ArgToRetTransfer(callee, n), + ActualParam(n, insn, arg), + AssignReturnValue(insn, ret). + + TaintHeap(insn, newHeap), + TransferTaint(heap, newHeap), + VarPointsTo(heap, hctx, var, vCtx) :- + IsTaintedFrom(insn, from, fromCtx, to, toCtx), + VarPointsTo(heap, hctx, from, fromCtx), + TaintHeap(_, heap), + newHeap = cat("TransferTaint::", insn), + VarPointsTo(oldHeap, oldHCtx, to, toCtx), + VarPointsTo(oldHeap, oldHCtx, var, vCtx), + AssignHeapAllocation(_, _, oldHeap, var, _, _). +} \ No newline at end of file diff --git a/logic/inputDeclaration.dl b/logic/inputDeclaration.dl index 904d634..833ce6b 100644 --- a/logic/inputDeclaration.dl +++ b/logic/inputDeclaration.dl @@ -1,3 +1,5 @@ +#pragma once + .type Insn <: symbol .type Var <: symbol .type Heap <: symbol @@ -29,10 +31,10 @@ // method .decl MethodInfo(method:Method, simplename:symbol, param:symbol, class:Class, return:Class, jvmDescriptor:symbol, arity:number) -.input MethodInfo(IO="file", filename="Method.facts", delimiter="\t") +.input MethodInfo(IO=file, filename="Method.facts", delimiter="\t") .decl MethodModifier(mod:symbol, method:Method) -.input MethodModifier(filename="Method-Modifier.facts") +.input MethodModifier(IO=file, filename="Method-Modifier.facts", delimiter="\t") .decl ThisVar(method:Method, this:Var) .input ThisVar @@ -61,10 +63,10 @@ // Field .decl FieldInfo(field:Field, declaringType:Class, simplename:symbol, type:Class) -.input FieldInfo(IO="file", filename="Field.facts", delimiter="\t") +.input FieldInfo(IO=file, filename="Field.facts", delimiter="\t") .decl FieldModifier(modifier:symbol, field:Field) -.input FieldModifier(filename="Field-Modifier.facts") +.input FieldModifier(IO=file, filename="Field-Modifier.facts", delimiter="\t") .decl LoadInstanceField(insn:Insn, index:number, var:Var, base:Var, field:Field, inMethod:Method) .input LoadInstanceField @@ -90,13 +92,13 @@ // others .decl VarType(var:Var, class:Class) -.input VarType(IO="file", filename="Var-Type.facts", delimiter="\t") +.input VarType(IO=file, filename="Var-Type.facts", delimiter="\t") .decl AssignLocal(insn:Insn, index:number, from:Var, to:Var, inMethod: Method) .input AssignLocal .decl AssignCast(insn:Insn, index:number, from:Var, to:Var, type:Class, inMethod:Method) -.input AssignCast(filename="AssignCast.facts") +.input AssignCast(IO=file, filename="AssignCast.facts", delimiter="\t") .decl AssignHeapAllocation(insn:Insn, index:number, heap:Heap, var:Var, inMethod:Method, linenumber:number) .input AssignHeapAllocation diff --git a/logic/one-callsite-sensitive-pt.dl b/logic/one-callsite-sensitive-pt.dl new file mode 100644 index 0000000..1267533 --- /dev/null +++ b/logic/one-callsite-sensitive-pt.dl @@ -0,0 +1,22 @@ +#pragma once +#include "abstract-context-sensitive-pt.dl" + +.type HContext = Insn +.type Context = Insn + +.comp OneCallsiteSensitivePT: AbstractContextSensitivePT{ + .override SelectInvocationContext + SelectInvocationContext(callerCtx, insn, baseHeap, hctx, calleeCtx) :- + Reachable(caller, callerCtx, _), + ( + SpecialMethodInvocation(insn, _, _, base, caller); + VirtualMethodInvocation(insn, _, _, base, caller) + ), + VarPointsTo(baseHeap, hctx, base, callerCtx), + calleeCtx = insn. + + SelectStaticInvocationContext(callerCtx, insn, calleeCtx) :- + Reachable(caller, callerCtx, _), + StaticMethodInvocation(insn, _, _, caller), + calleeCtx = insn. +} \ No newline at end of file diff --git a/logic/one-object-sensitive-pt.dl b/logic/one-object-sensitive-pt.dl new file mode 100644 index 0000000..d3f204d --- /dev/null +++ b/logic/one-object-sensitive-pt.dl @@ -0,0 +1,17 @@ +#pragma once +#include "abstract-context-sensitive-pt.dl" + +.type HContext = Heap +.type Context = Heap + +.comp OneObjectSensitivePT: AbstractContextSensitivePT{ + .override SelectInvocationContext + SelectInvocationContext(callerCtx, insn, baseHeap, hctx, calleeCtx) :- + Reachable(caller, callerCtx, _), + ( + SpecialMethodInvocation(insn, _, _, base, caller); + VirtualMethodInvocation(insn, _, _, base, caller) + ), + VarPointsTo(baseHeap, hctx, base, callerCtx), + calleeCtx = baseHeap. +} \ No newline at end of file diff --git a/logic/one-type-sensitive-pt.dl b/logic/one-type-sensitive-pt.dl new file mode 100644 index 0000000..48ff9cf --- /dev/null +++ b/logic/one-type-sensitive-pt.dl @@ -0,0 +1,19 @@ +#pragma once +#include "abstract-context-sensitive-pt.dl" + +.type HContext = Class +.type Context = Class + +.comp OneTypeSensitivePT: AbstractContextSensitivePT{ + .override SelectInvocationContext + SelectInvocationContext(callerCtx, insn, baseHeap, hctx, calleeCtx) :- + Reachable(caller, callerCtx, _), + ( + SpecialMethodInvocation(insn, _, _, base, caller); + VirtualMethodInvocation(insn, _, _, base, caller) + ), + VarPointsTo(baseHeap, hctx, base, callerCtx), + AssignHeapAllocation(_, _, baseHeap, _, inmethod, _), + MethodInfo(inmethod, _, _, inType, _, _, _), + calleeCtx = inType. +} \ No newline at end of file diff --git a/logic/pt-noctx.dl b/logic/pt-noctx.dl index e662258..217a15f 100644 --- a/logic/pt-noctx.dl +++ b/logic/pt-noctx.dl @@ -1,3 +1,6 @@ +#pragma once +#include "utils.dl" + .comp ContextInsensitivePt{ .decl VarPointsTo(heap:Heap, var:Var) .decl InstanceFieldPointsTo(heap:Heap, baseHeap:Heap, field:Field) diff --git a/logic/ptaint.dl b/logic/ptaint.dl index d9043de..2d83ea0 100644 --- a/logic/ptaint.dl +++ b/logic/ptaint.dl @@ -1,7 +1,7 @@ +#pragma once #include "pt-noctx.dl" -.comp PTaint{ - .init cipt = ContextInsensitivePt +.comp PTaint : ContextInsensitivePt{ .decl TaintHeap(insn:Insn, heap:Heap) .decl SourceMethod(method:Method) @@ -16,41 +16,44 @@ // taint arg to param - cipt.VarPointsTo(heap, param) :- - cipt.CallGraph(insn, _, callee), + VarPointsTo(heap, param) :- + CallGraph(insn, _, callee), ActualParam(n, insn, arg), FormalParam(n, callee, param), - cipt.VarPointsTo(heap, arg), + VarPointsTo(heap, arg), TaintHeap(_, heap), !SanitizeMethod(callee). TaintHeap(insn, heap), - cipt.VarPointsTo(heap, to) :- + VarPointsTo(heap, to) :- SourceMethod(callee), - cipt.CallGraph(insn, _, callee), + CallGraph(insn, _, callee), AssignReturnValue(insn, to), heap = cat("NewTainted::", insn). IsTaintedFrom(insn, base, ret) :- - cipt.CallGraph(insn, _, callee), + CallGraph(insn, _, callee), BaseToRetTransfer(callee), - VirtualMethodInvocation(insn, _, _, base, _), + ( + VirtualMethodInvocation(insn, _, _, base, _); + SpecialMethodInvocation(insn, _, _, base, _) + ), AssignReturnValue(insn, ret). IsTaintedFrom(insn, arg, ret) :- - cipt.CallGraph(insn, _, callee), + CallGraph(insn, _, callee), ArgToRetTransfer(callee, n), ActualParam(n, insn, arg), AssignReturnValue(insn, ret). TaintHeap(insn, newHeap), TransferTaint(heap, newHeap), - cipt.VarPointsTo(newHeap, var) :- + VarPointsTo(newHeap, var) :- IsTaintedFrom(insn, from, to), - cipt.VarPointsTo(heap, from), + VarPointsTo(heap, from), TaintHeap(_, heap), newHeap = cat("TransferTaint::", insn), - cipt.VarPointsTo(oldHeap, to), + VarPointsTo(oldHeap, to), AssignHeapAllocation(_, _, oldHeap, var, _, _). } \ No newline at end of file diff --git a/logic/rta.dl b/logic/rta.dl index f2720cb..deb2019 100644 --- a/logic/rta.dl +++ b/logic/rta.dl @@ -1,3 +1,7 @@ +#pragma once +#include "inputDeclaration.dl" +#include "utils.dl" + .comp RTA{ .decl EntryPoint(simplename:symbol, descriptor:symbol, class:Class) .decl Reachable(method:Method, step:number) @@ -97,7 +101,7 @@ .decl RefinedReachable(method:Method) - #if defined(RTAO) + #ifdef RTAO #if RTAO == 1 RefinedReachable(method) :- SinkReachable(method, _, _). diff --git a/logic/simple-cha.dl b/logic/simple-cha.dl new file mode 100644 index 0000000..240a3b2 --- /dev/null +++ b/logic/simple-cha.dl @@ -0,0 +1,105 @@ +#pragma once +#include "utils.dl" + +.decl EntryPoint(simplename:symbol, descriptor:symbol, class:Class) +.decl Reachable(method:Method, step:number) +.decl SinkDesc(simplename:symbol, class:Class) +.decl SinkMethod(method:Method) +.decl EntryMethod(method:Method) +.decl BanCaller(method:Method) + +BanCaller(method) :- + MethodInfo(method, simplename, _, class, _, _, _), + contains("java.util", class). + +.output BanCaller + + +.decl CallGraph(insn:Insn, caller:Method, callee:Method) +.decl SimpleCallGraph(insn:Insn, caller:Method, callee:Method) +.decl ChaGraph(caller:Method, implementation:Method, callee:Method) + +SinkMethod(method) :- + SinkDesc(simplename, class), + SubEqClass(subeqclass, class), + !MethodModifier("abstract", method), + MethodInfo(method, simplename, _, subeqclass, _, _, _). + +EntryMethod(method), +Reachable(method, 0) :- + EntryPoint(simplename, descriptor, class), + Dispatch(simplename, descriptor, class, method). + +Reachable(callee, n+1), +SimpleCallGraph(insn, caller, callee), +CallGraph(insn, caller, callee) :- + Reachable(caller, n), + !BanCaller(caller), + n < MAXSTEP, + SpecialMethodInvocation(insn, _, callee, _, caller). + +Reachable(callee, n+1), +SimpleCallGraph(insn, caller, callee), +CallGraph(insn, caller, callee) :- + Reachable(caller, n), + !BanCaller(caller), + n < MAXSTEP, + StaticMethodInvocation(insn, _, callee, caller). + +Reachable(method, n+1), +Reachable(callee, n+1), +SimpleCallGraph(insn, caller, method), +CallGraph(insn, caller, callee) :- + Reachable(caller, n), + !BanCaller(caller), + n < MAXSTEP, + VirtualMethodInvocation(insn, _, method, receiver, caller), + MethodInfo(method, simplename, _, _, _, descriptor, _), + VarType(receiver, class), + SubEqClass(subeqclass, class), + !ClassModifier("abstract", subeqclass), + Dispatch(simplename, descriptor, subeqclass, callee). + +Reachable(callee, n+1), +SimpleCallGraph(insn, caller, callee), +CallGraph(insn, caller, callee) :- + Reachable(caller, n), + !BanCaller(caller), + n < MAXSTEP, + StaticMethodInvocation(insn, _, method, caller), + MethodInfo(method, "doPrivileged", _, "java.security.AccessController", _, _, _), + ActualParam(0, insn, param), + VarType(param, class), + MethodInfo(callee, "run", _, class, _, _, 0). + +ChaGraph(caller, implementation, callee) :- + Reachable(caller, _), + MethodInfo(caller, simplename, _, class, _, descriptor, _), + SubClass(subclass, class), + Dispatch(simplename, descriptor, subclass, implementation), + caller != implementation, + SimpleCallGraph(_, implementation, callee). + +.output ChaGraph + +.decl CallNode(node:Method, label:symbol) +.output CallNode + +CallNode(node, "method") :- + !EntryMethod(node), + !SinkMethod(node), + Reachable(node, _). + +CallNode(node, "sink") :- + Reachable(node, _), + SinkMethod(node). + +CallNode(node, "entry") :- + Reachable(node, _), + EntryMethod(node). + +.decl CallEdge(caller:Method, callee:Method) +.output CallEdge + +CallEdge(caller, callee) :- + SimpleCallGraph(_, caller, callee). diff --git a/logic/utils.dl b/logic/utils.dl index bb8a9b0..a7e564a 100644 --- a/logic/utils.dl +++ b/logic/utils.dl @@ -1,9 +1,15 @@ +#pragma once +#include "inputDeclaration.dl" + //self define relation // Utils .decl SubClass(subclass:Class, class:Class) .decl SubEqClass(subeqclass:Class, class:Class) +SubEqClass("byte", "byte"). +SubEqClass("byte[]", "byte[]"). + SubEqClass(subclass, class) :- SubClass(subclass, class). SubEqClass(class, class) :- ClassType(class). diff --git a/neo4j/CallEdgeHeader.csv b/neo4j/CallEdgeHeader.csv new file mode 100644 index 0000000..5e2a8da --- /dev/null +++ b/neo4j/CallEdgeHeader.csv @@ -0,0 +1 @@ +:START_ID(Method) insn :END_ID(Method) \ No newline at end of file diff --git a/neo4j/CallNodeHeader.csv b/neo4j/CallNodeHeader.csv new file mode 100644 index 0000000..27f9b61 --- /dev/null +++ b/neo4j/CallNodeHeader.csv @@ -0,0 +1 @@ +method:ID(Method) :LABEL \ No newline at end of file diff --git a/neo4j/ChaEdgeHeader.csv b/neo4j/ChaEdgeHeader.csv new file mode 100644 index 0000000..27b0681 --- /dev/null +++ b/neo4j/ChaEdgeHeader.csv @@ -0,0 +1 @@ +:START_ID(Method) method :END_ID(Method) \ No newline at end of file diff --git a/neoImportCall-4.4.sh b/neoImportCall-4.4.sh new file mode 100644 index 0000000..0e82b0d --- /dev/null +++ b/neoImportCall-4.4.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +dbname=$1$(date "+%m%d%H%M") + +neo4j-admin database import full --relationships=Call="/bytecodedl/neo4j/CallEdgeHeader.csv,/bytecodedl/output/.*CallEdge.csv" --nodes="/bytecodedl/neo4j/CallNodeHeader.csv,/bytecodedl/output/.*CallNode.csv" --delimiter="\t" $dbname + +if grep -q "dbms.active_database" /var/lib/neo4j/conf/neo4j.conf; then + sed -i -E "s/dbms.active_database=\w+/dbms.active_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +else + echo "dbms.active_database=$dbname" >> /var/lib/neo4j/conf/neo4j.conf +fi diff --git a/neoImportCall.sh b/neoImportCall.sh index da1e711..7e2ae73 100644 --- a/neoImportCall.sh +++ b/neoImportCall.sh @@ -2,10 +2,10 @@ dbname=$1$(date "+%m%d%H%M") -neo4j-admin import --relationships=Call="/bytecodedl/neo4j/CallEdgeHeader.csv,/bytecodedl/output/.*CallEdge.csv" --nodes="/bytecodedl/neo4j/CallNodeHeader.csv,/bytecodedl/output/.*CallNode.csv" --database=$dbname --delimiter="\t" +neo4j-admin database import full --nodes="/bytecodedl/neo4j/CallNodeHeader.csv,/bytecodedl/output/.*CallNode.csv" --relationships=Call="/bytecodedl/neo4j/CallEdgeHeader.csv,/bytecodedl/output/CallEdge.csv" --delimiter="\t" $dbname -if grep -q "dbms.active_database" /var/lib/neo4j/conf/neo4j.conf; then - sed -i -E "s/dbms.active_database=\w+/dbms.active_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +if grep -q "#initial.dbms.default_database" /var/lib/neo4j/conf/neo4j.conf; then + sed -i -E "s/#initial.dbms.default_database=\S+/initial.dbms.default_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf else - echo "dbms.active_database=$dbname" >> /var/lib/neo4j/conf/neo4j.conf -fi \ No newline at end of file + sed -i -E "s/initial.dbms.default_database=\S+/initial.dbms.default_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +fi diff --git a/neoImportChaCall-4.4.sh b/neoImportChaCall-4.4.sh new file mode 100644 index 0000000..ef763e4 --- /dev/null +++ b/neoImportChaCall-4.4.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +dbname=$1$(date "+%m%d%H%M") + +neo4j-admin database import --nodes=Method="/bytecodedl/neo4j/CallNodeHeader.csv,/bytecodedl/output/.*CallNode.csv" --relationships=Call="/bytecodedl/neo4j/CallEdgeHeader.csv,/bytecodedl/output/CallEdge.csv" --relationships=Cha="/bytecodedl/neo4j/ChaEdgeHeader.csv,/bytecodedl/output/ChaGraph.csv" --database=$dbname --delimiter="\t" + +if grep -q "dbms.active_database" /var/lib/neo4j/conf/neo4j.conf; then + sed -i -E "s/dbms.active_database=\w+/dbms.active_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +else + echo "dbms.active_database=$dbname" >> /var/lib/neo4j/conf/neo4j.conf +fi diff --git a/neoImportChaCall.sh b/neoImportChaCall.sh new file mode 100644 index 0000000..130c7bb --- /dev/null +++ b/neoImportChaCall.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +dbname=$1$(date "+%m%d%H%M") + +neo4j-admin database import full --nodes=Method="/bytecodedl/neo4j/CallNodeHeader.csv,/bytecodedl/output/.*CallNode.csv" --relationships=Call="/bytecodedl/neo4j/CallEdgeHeader.csv,/bytecodedl/output/CallEdge.csv" --relationships=Cha="/bytecodedl/neo4j/ChaEdgeHeader.csv,/bytecodedl/output/ChaEdge.csv" --delimiter="\t" $dbname + +if grep -q "#initial.dbms.default_database" /var/lib/neo4j/conf/neo4j.conf; then + sed -i -E "s/#initial.dbms.default_database=\w+/initial.dbms.default_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +else + sed -i -E "s/initial.dbms.default_database=\w+/initial.dbms.default_database=$dbname/g" /var/lib/neo4j/conf/neo4j.conf +fi