Discussion:
TCP payload reassembly
mark
2008-01-25 14:59:51 UTC
Permalink
Can anyone help me understand the TCP reassembly process and how I can apply it to a script using scapy? All the articles I have read on the subject rely on the More Fragments bit set to 1 but I have an entire capture file that is full of "parts of a reassembled PDU" that do not have that bit set.

I am attaching a small capture file with a gif image in it that was downloaded (to practice with - hopefully your filters wont delete this file). I know that I need to pay attention to the sequence numbers and all that but how do I know when a packet is fragmented and when have I received all the packets needed to reassemble the payload?

So the script will do something like this:

When I see a fragmented packet:
grab the appropriate information to know what to expect in the next packet

when I have all the packets:
reassemble the payload

if the payload is a gif image:
print "found a gif image"
wrpcap("gif.gif",reassembled_gif_image_from_packet_data)

I would very much appreciate any help you can provide.
Thanks in advance!

-Mark


---------------------------------
Never miss a thing. Make Yahoo your homepage.
Sake Blok
2008-01-25 15:36:32 UTC
Permalink
Post by mark
Can anyone help me understand the TCP reassembly process and how
I can apply it to a script using scapy? All the articles I have
read on the subject rely on the More Fragments bit set to 1 but
I have an entire capture file that is full of "parts of a
reassembled PDU" that do not have that bit set.
You seem to be mixing IP fragmentation and TCP segmentation. When
an IP packet traverses a network with a smaller MTU than the packet
needs, it can be fragmented into several IP fragments. The IP header
will have the "more fragments" bit set as well as the "fragment offset"
value. The IP payload (TCP or UDP or other protocol) will be split
accross these IP fragments.

TCP segmentation is the process in which a higher layer protocol
(like HTTP) transports an object that is larger than the MSS for
the medium over which it will be transported. Since TCP is a streaming
protocol, it has no knowledge about where one object starts and
ends. It just transports the data it receives on it's buffers.
Therefor there are no TCP fragment flags. TCP uses sequence numbers
to deliver the data complete and in order to it's higher layer
protocol on the receiving end. It is the responsibility of that
higher layer protocol to mark the beginning and end of the objects
it receives.

When you see "[TCP segment of a reassembled PDU]" in Wireshark, it
means that Wireshark minics the behaviour of the higher layer
protocol and reassembles the seperate TCP segments into one PDU.
Post by mark
I am attaching a small capture file with a gif image in it that
was downloaded (to practice with - hopefully your filters wont
delete this file). I know that I need to pay attention to the
sequence numbers and all that but how do I know when a packet is
fragmented and when have I received all the packets needed to
reassemble the payload?
grab the appropriate information to know what to expect in the next packet
reassemble the payload
print "found a gif image"
wrpcap("gif.gif",reassembled_gif_image_from_packet_data)
I would very much appreciate any help you can provide.
As I'm still lurking on the scapy list, i'm afraid I'm not able to
help you on the scapy-scripting side of your question...

Cheers,
Sake

---------------------------------------------------------------------
To unsubscribe, send a mail to scapy.ml-***@secdev.org
mark
2008-01-25 16:27:46 UTC
Permalink
Thank you very much for your reply! So I have a few questions ....

Sake Blok <***@euronet.nl> wrote:

You seem to be mixing IP fragmentation and TCP segmentation. When
an IP packet traverses a network with a smaller MTU than the packet
needs, it can be fragmented into several IP fragments. The IP header
will have the "more fragments" bit set as well as the "fragment offset"
value. The IP payload (TCP or UDP or other protocol) will be split
accross these IP fragments.

TCP segmentation is the process in which a higher layer protocol
(like HTTP) transports an object that is larger than the MSS for
the medium over which it will be transported. Since TCP is a streaming
protocol, it has no knowledge about where one object starts and
ends. It just transports the data it receives on it's buffers.
Therefor there are no TCP fragment flags. TCP uses sequence numbers
to deliver the data complete and in order to it's higher layer
protocol on the receiving end. It is the responsibility of that
higher layer protocol to mark the beginning and end of the objects
it receives.

When you see "[TCP segment of a reassembled PDU]" in Wireshark, it
means that Wireshark minics the behaviour of the higher layer
protocol and reassembles the seperate TCP segments into one PDU.



So if I wanted to find the beginning of a payload that is split up I would have to know what the application markings are for "header" and "footer" of <given file type>? does it use standard "magic" values for files? for example \xFF\xD8 begins a JPEG and \xFF\xD9 ends the JPEG? or is there some value set somewhere of the length of the streamed data? Do you by chance know how wireshark understand this?

Based on the capture file that I attached - can you walk me through the logic of wireshark?

SYN
SYN/ACK
ACK
data transfer split up
FIN

does it just look at the handshake then collect data until it gets a FIN? It has to be more complicated then that :)


---------------------------------
Looking for last minute shopping deals? Find them fast with Yahoo! Search.
Sake Blok
2008-01-25 18:00:16 UTC
Permalink
Post by mark
Thank you very much for your reply! So I have a few questions ....
We're getting a little of topic here, but I think we can get away with
it ;-)

BTW, the wireshark-users mailing list is a great place to discuss the
things you see in Wireshark, but not always quite understand.
Post by mark
So if I wanted to find the beginning of a payload that is split
up I would have to know what the application markings are for
"header" and "footer" of <given file type>? does it use standard
"magic" values for files? for example \xFF\xD8 begins a JPEG and
\xFF\xD9 ends the JPEG?
No
Post by mark
or is there some value set somewhere of
the length of the streamed data?
Yes and no, that depends on the implementation of the http-server.
Post by mark
Do you by chance know how wireshark understand this?
Not in very great detail, but I know the general idea...
Post by mark
Based on the capture file that I attached - can you walk me through the logic of wireshark?
SYN
SYN/ACK
ACK
This is the so-called three way handshake where a session is being
set up. The sequence numbers for both flows of the connection are
initiated and some other options might be set.
Post by mark
data transfer split up
In frame 4 you can see the clients request towards to server. This
request ends with a double "CR/LF". This tells the http-daemon that
it can start to process the request according to all the http-headers
in the request.

In frame 5, the tcp stack on the http-server acknowledges the
received data. If it doesn't do this within a certain timeout, the
clients tcp-stack will think the packet was lost and will send the
tcp segment again.

Then the http-daemon starts to transmit the requested object. As
the object does not fit into one TCP segment, it is being split up
by the tcp stack into frame 6 and 8. Wireshark knows, just like
your browser, that frame 6 did not contain the whole object. How?
Have a look at the http headers in the response. The header
"Content-Length: 1588" tells the client that he should receive
1588 bytes of data, after the http-headers. The headers end
at the double "CR/LF". Just like in the request.
Post by mark
FIN
These packets tell the tcp stack on both sides to tear down
the connection and free up the resources for other connections.
Post by mark
does it just look at the handshake then collect data until it
gets a FIN?
In the 'old' days: yes. Before http-keepalives were introduced,
that is exactly what the browser did to know when to stop collecting
data for the object. But now you can send more than one object
in one TCP session, so there has to be some communication on what
amount of data to expect.
Post by mark
It has to be more complicated then that :)
Well, it gets even more complicated when "Transfer-Encoding: Chunked"
is used. But I will leave that up to you to read a book on HTTP or
perhaps the RFC about HTTP ;-)

Cheers,
Sake


PS In recent versions of Wireshark, you are able to export the
individual objects that are transported over http. Have a look
at "File | export" when you have a tracefile open with
some http traffic.

---------------------------------------------------------------------
To unsubscribe, send a mail to scapy.ml-***@secdev.org
Philippe Biondi
2008-01-25 17:30:29 UTC
Permalink
Post by mark
SYN
SYN/ACK
ACK
data transfer split up
FIN
does it just look at the handshake then collect data until it gets a
FIN? It has to be more complicated then that :)
Sorry but, at this point, you'll have to read
http://www.ietf.org/rfc/rfc793.txt
--
Philippe Biondi <phil@ secdev.org> SecDev.org
Computer Security/R&D http://www.secdev.org
PGP KeyID:3D9A43E2 FingerPrint:C40A772533730E39330DC0985EE8FF5F3D9A43E2

---------------------------------------------------------------------
To unsubscribe, send a mail to scapy.ml-***@secdev.org
Philippe Biondi
2008-01-25 17:12:52 UTC
Permalink
Post by mark
Can anyone help me understand the TCP reassembly process and how I can
apply it to a script using scapy? All the articles I have read on the
subject rely on the More Fragments bit set to 1 but I have an entire
capture file that is full of "parts of a reassembled PDU" that do not
have that bit set.
You're confusing IP defragmentation and TCP stream reassembly.
Post by mark
I am attaching a small capture file with a gif image in it that was
downloaded (to practice with - hopefully your filters wont delete this
file). I know that I need to pay attention to the sequence numbers and
all that but how do I know when a packet is fragmented and when have I
received all the packets needed to reassemble the payload?
a=rdpcap("/tmp/frag_gif.cap")
a.show()
0000 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www S
0001 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 SA
0002 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www A
0003 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www PA / Raw
0004 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 A
0005 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 A / Raw
0006 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www A
0007 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 PA / Raw
0008 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www A
0009 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 FA
0010 Ether / IP / TCP 192.168.1.34:43096 > 87.248.217.110:www FA
0011 Ether / IP / TCP 87.248.217.110:www > 192.168.1.34:43096 A

Considering there is no reemission, no reordering needed, etc, you can get
Post by mark
stream="".join(p.load for p in a[Raw])
req,ans,content=stream.split("\r\n\r\n",2)
"gzip" in ans
True
Post by mark
import zlib
gif=zlib.decompress(content,40)
Here is your gif file.


But the general case is much much more complicated..
--
Philippe Biondi <phil@ secdev.org> SecDev.org
Computer Security/R&D http://www.secdev.org
PGP KeyID:3D9A43E2 FingerPrint:C40A772533730E39330DC0985EE8FF5F3D9A43E2
Dirk Loss
2008-01-26 09:05:25 UTC
Permalink
Post by mark
Can anyone help me understand the TCP reassembly process
and how I can apply it to a script using scapy?
I guess it's easier to have the TCP streams reassembled by a special
library such as libnids. A wrapper called pynids [1] is available, so
that you can use it from your Python scripts.
And have a look at flowgrep [2], which uses pynids. Maybe it already
does what you want.

I have been wondering for some time if Scapy and pynids could be
combined in some useful way. Any ideas?

Regards
Dirk

[1] http://pilcrow.madison.wi.us/pynids/
[2] http://monkey.org/~jose/software/flowgrep/

---------------------------------------------------------------------
To unsubscribe, send a mail to scapy.ml-***@secdev.org

Continue reading on narkive:
Loading...