Rediscovering CVE-2023-36617 (ruby ReDoS) with fuzzing

·

6 min read

summary

CVE-2023-36617

Two ReDoS bugs existed in the Ruby uri module. Both bugs cause the program to hang and eventually throw a URI::InvalidURIError error.

They affect version v0.12.2 of the gem.

The commit has some tests that help understand what was going on.

The first test:

def test_rfc3986_port_check
  pre = ->(length) {"\t" * length + "a"}
  uri = URI.parse("http://my.example.com")
  assert_linear_performance((1..5).map {|i| 10**i}, pre: pre) do |port|
    assert_raise(URI::InvalidComponentError) do
      uri.port = port
    end
  end
end

It checks how long it takes for the code to complete.

The root cause is a greedy regex match that was first introduced on the commit 3e832346:

commit 3e832346a42d9412a0f1df0489ed1365ac8c195c
Author: naruse <naruse@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Mon Jun 23 03:18:51 2014 +0000

    * lib/uri/generic.rb (check_port): allow strings for port= as
      described in rdoc.

    * lib/uri/rfc3986_parser.rb (regexp): implementation detail of above.

    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

diff --git a/lib/uri/rfc3986_parser.rb b/lib/uri/rfc3986_parser.rb
index cd95ab8..aa74e11 100644
--- a/lib/uri/rfc3986_parser.rb
+++ b/lib/uri/rfc3986_parser.rb
@@ -84,7 +84,7 @@ module URI
         QUERY: /\A(?:%\h\h|[!$&-.0-;=@-Z_a-z~]|[\/?])*\z/,
         FRAGMENT: /\A(?:%\h\h|[!$&-.0-;=@-Z_a-z~]|[\/?])*\z/,
         OPAQUE: nil,
-        PORT: nil,
+        PORT: /\A[\x09\x0a\x0c\x0d ]*\d*[\x09\x0a\x0c\x0d ]*\z/,
       }
     end

The second can be triggered by both URI::RFC2396_Parser.parse(uri) and URI::RFC2396_Parser.split(uri)

Second test:

def test_rfc2822_parse_relative_uri
  pre = ->(length) {
    " " * length + "\0"
  }
  parser = URI::RFC2396_Parser.new
  assert_linear_performance((1..5).map {|i| 10**i}, pre: pre) do |uri|
    assert_raise(URI::InvalidURIError) do
      parser.split(uri)
    end
  end
end

and was introduced on the commit d8c414e9:

commit d8c414e99dda6cbb0bf91b9ad5f6a95321e00435
Author: naruse <naruse@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Sun Jun 22 00:22:19 2014 +0000
...
diff --git a/lib/uri/rfc2396_parser.rb b/lib/uri/rfc2396_parser.rb
new file mode 100644
index 0000000..50e3ae6
--- /dev/null
+++ b/lib/uri/rfc2396_parser.rb
@@ -0,0 +1,543 @@
...
+      ret[:ABS_URI] = Regexp.new('\A\s*' + pattern[:X_ABS_URI] + '\s*\z', Regexp::EXTENDED)
+      ret[:REL_URI] = Regexp.new('\A\s*' + pattern[:X_REL_URI] + '\s*\z', Regexp::EXTENDED)
...

From 2014, it's been hidden for a long time.

But this does seem hard to trigger, the bug lives on URI::RFC2396_Parser and the default parser is URI::RFC3986_Parser. It does have some functions that use RFC2396 but they are marked as deprecated.

def self.extract(str, schemes = nil, &block)
  warn "URI.extract is obsolete", uplevel: 1 if $VERBOSE
  DEFAULT_PARSER.extract(str, schemes, &block)
end

def self.regexp(schemes = nil)
  warn "URI.regexp is obsolete", uplevel: 1 if $VERBOSE
  DEFAULT_PARSER.make_regexp(schemes)
end

Also, I couldn't find any path that would lead from RFC3986 to RFC2396.

The core of the problem is something called catastrophic backtracking. If the quantifier expressions (e.g. [\x09\x0a\x0c\x0d ]*) appear more than once in the same regex and are not mutually exclusive, anytime a backtrack happens, the regex has to process the same character multiple times.

A better and more complete explanation is at: explosion explanation

Here is a neat visualization of what it looks like: explosion

To fix that we force the first quantifier to not backtrack by using possessive quantifiers:

A++A+B$
  ^

It just basically says don't backtrack.

after fix visualization

There was also a fix to the regex used to parse URIs that seems to have the same problem as the other bugs:

-    RFC3986_URI = /\A(?<URI>(?<scheme>[A-Za-z][+\-.0-9A-Za-z]*):(?<hier-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*)@)?(?<host>(?<IP-literal>\[(?:(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:)?\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h+\.[!$&-.0-;=A-Z_a-z~]+))\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])*))(?::(?<port>\d*))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*))*)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])+)(?:\/\g<segment>)*)?)|(?<path-rootless>\g<segment-nz>(?:\/\g<segment>)*)|(?<path-empty>))(?:\?(?<query>[^#]*))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*))?)\z/
-    RFC3986_relative_ref = /\A(?<relative-ref>(?<relative-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*)@)?(?<host>(?<IP-literal>\[(?:(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:){,1}\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h+\.[!$&-.0-;=A-Z_a-z~]+))\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])+))?(?::(?<port>\d*))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*))*)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])+)(?:\/\g<segment>)*)?)|(?<path-noscheme>(?<segment-nz-nc>(?:%\h\h|[!$&-.0-9;=@-Z_a-z~])+)(?:\/\g<segment>)*)|(?<path-empty>))(?:\?(?<query>[^#]*))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*))?)\z/
+    RFC3986_URI = /\A(?<URI>(?<scheme>[A-Za-z][+\-.0-9A-Za-z]*+):(?<hier-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*+)@)?(?<host>(?<IP-literal>\[(?:(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:)?\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h++\.[!$&-.0-;=A-Z_a-z~]++))\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])*+))(?::(?<port>\d*+))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*+))*+)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])++)(?:\/\g<segment>)*+)?)|(?<path-rootless>\g<segment-nz>(?:\/\g<segment>)*+)|(?<path-empty>))(?:\?(?<query>[^#]*+))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*+))?)\z/
+    RFC3986_relative_ref = /\A(?<relative-ref>(?<relative-part>\/\/(?<authority>(?:(?<userinfo>(?:%\h\h|[!$&-.0-;=A-Z_a-z~])*+)@)?(?<host>(?<IP-literal>\[(?:(?<IPv6address>(?:\h{1,4}:){6}(?<ls32>\h{1,4}:\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]|\d)\.\g<dec-octet>\.\g<dec-octet>\.\g<dec-octet>))|::(?:\h{1,4}:){5}\g<ls32>|\h{1,4}?::(?:\h{1,4}:){4}\g<ls32>|(?:(?:\h{1,4}:){,1}\h{1,4})?::(?:\h{1,4}:){3}\g<ls32>|(?:(?:\h{1,4}:){,2}\h{1,4})?::(?:\h{1,4}:){2}\g<ls32>|(?:(?:\h{1,4}:){,3}\h{1,4})?::\h{1,4}:\g<ls32>|(?:(?:\h{1,4}:){,4}\h{1,4})?::\g<ls32>|(?:(?:\h{1,4}:){,5}\h{1,4})?::\h{1,4}|(?:(?:\h{1,4}:){,6}\h{1,4})?::)|(?<IPvFuture>v\h++\.[!$&-.0-;=A-Z_a-z~]++))\])|\g<IPv4address>|(?<reg-name>(?:%\h\h|[!$&-.0-9;=A-Z_a-z~])++))?(?::(?<port>\d*+))?)(?<path-abempty>(?:\/(?<segment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])*+))*+)|(?<path-absolute>\/(?:(?<segment-nz>(?:%\h\h|[!$&-.0-;=@-Z_a-z~])++)(?:\/\g<segment>)*+)?)|(?<path-noscheme>(?<segment-nz-nc>(?:%\h\h|[!$&-.0-9;=@-Z_a-z~])++)(?:\/\g<segment>)*+)|(?<path-empty>))(?:\?(?<query>[^#]*+))?(?:\#(?<fragment>(?:%\h\h|[!$&-.0-;=@-Z_a-z~\/?])*+))?)\z/

After reading about it seems surprising that no one found out about this bug earlier. It is like a textbook catastrophic backtracking regex.

I was also going to try and write a high-effort summary of the inner workings of regex and explosions but fuzzing is just way more interesting.

fuzzing

I'm using afl-ruby which uses afl++, since afl++ is a coverage-guided fuzzer afl-ruby uses the TracePoint ruby class to feed it coverage.

afl-ruby
afl-ruby-article

So the first thing I tried was adding a trimmed-down version of the crash input and see if it would find it.

\t * 10000 + '\0'
AFL_SKIP_BIN_CHECK=1 afl-fuzz -a text -i input -o output -- $(which ruby) fuzz.rb

After a couple of hours, it didn't find anything and the corpus got stuck after the first minute. I tried messing with the options but it didn't change anything.

So I tried fuzzing the two different versions of the gem to see if I could find anything interesting. And I did find some quirks between the two versions (v0.12.1, v0.12.2).

The first one was:

legend = [:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment]
component_ary = [nil, ":", nil, nil, nil, "/:", nil, nil, nil] # v0.12.1
component_ary = [nil, ":", "", nil, nil, "/:", nil, nil, nil] # v0.12.2

So the host is now "" instead of nil, but soon I found out that this was expected:

commit 81263c9e94bd67ca01deee238842a88c2c8885f3
Author: NARUSE, Yui <naruse@airemix.jp>
Date:   Sun Jan 13 08:58:00 2019 +0900

    URI.parse should set empty string in host instead of nil

    ruby/ruby@dd5118f8524c425894d4716b787837ad7380bb0d

Very helpful commit message

There was also:

Now :to_s adds double slashes and this change is directly related to the above commit.

input = "//:@:/:"
old = ":@/:" # v0.12.1
new = "//:@/:" # v0.12.2
# in v0.12.1 host would be nil causing this to be false changing to "" makes the check pass
if @host || %w[file postgres].include?(@scheme)
  str << '//'
end

The third difference was:

input = "//::"
old_parser = nil # Bad URI expection
parser = "//::" # component_ary = [path]

input = "//p:x"
old_parser = nil # Bad URI expection
parser = "//p:x" # component_ary = [path]

input = "//@@?."
old_parser = nil # Bad URI expection
parser = "//@@" # component_ary = [path, query]

input = "//mmai:f#tZ"
old_parser = nil # Bad URI expection 
parser = "//mmai:f"# component_ary = [path, fragment]

These are kinda interesting but don't seem to have any security implications.

grammar mutations

After a bit, I moved on from the diffing and tried out grammar mutators

/\A[\x09\x0a\x0c\x0d ]*\d*[\x09\x0a\x0c\x0d ]*\z/

The grammar that I came up with

{
  "<port>": [
    ["<spaces>", "<digits>", "<spaces>"]
  ],
  "<digits>": [["<digit-1>"]],
  "<digit-1>": [[], ["<digit>"], ["<digit>", "<digit-1>"]],
  "<digit>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"], ["8"]],
  "<spaces>": [["<space-1>"]],
  "<space-1>": [[], ["<space>"], ["<space>", "<space-1>"]],
  "<space>": [["\u0009"], ["\u000a"], ["\u000c"], ["\u000d"], ["\u0000"]]
}

This looks to be the better option, it generated the biggest corpus. But it seems to also get stuck after a while.

After a while realized that since afl-ruby uses TracePoint it won't reach ruby internals like the regex engine. Since TracePoint only records whenever a C function is called, not its internals.

So I guess I can try different mutations and trust the fuzzer?

it found it

After about 9 hours of fuzzing, 6 of those without increasing the corpus it found the hang. maybe could also find it without the grammar mutator?

After about 20 minutes it found the hang.

The command used

AFL_SKIP_BIN_CHECK=1 afl-fuzz -i input -o output_default -t 500 -P exploit -- $(which ruby) fuzz.rb

Only a single file with a single white space as input.

I think the -P exploit is what changed the result here.

afl++

Conclusion

This bug is interesting for fuzzing, afl found it in 20 minutes using the -P exploit flag. Of course, I already knew that the bug existed and how to find it. But it also found it without anything on the corpus (just a file with a single whitespace in it).

On the other side, the bug seems kinda obvious if you understand how regexes work. But regex is hard and I guess no one checked it.


Notes

  1. It seems that v0.12.1 was an incomplete fix for a similar issue earlier this year. CVE-2023-28755

  2. Taking a second look at the code there is the following regex:

    /\A(?:[^@,;]+@[^@,;]+(?:\z|[,;]))*\z/

    This looks like it would be vulnerable to the same problem but I could not reproduce it. I think it's because the @ at the middle serves as a checkpoint so it doesn't backtrack the whole regex.

Did you find this article valuable?

Support shafouz by becoming a sponsor. Any amount is appreciated!