org.apache.james.mime4j
Class MimeTokenStream

java.lang.Object
  extended byorg.apache.james.mime4j.MimeTokenStream

public class MimeTokenStream
extends java.lang.Object

Parses MIME (or RFC822) message streams of bytes or characters. The stream is converted into an event stream.

Typical usage:

      MimeTokenStream stream = new MimeTokenStream();
      stream.parse(new BufferedInputStream(new FileInputStream("mime.msg")));
      for (int state = stream.getState();
           state != MimeTokenStream.T_END_OF_STREAM;
           state = stream.next()) {
          switch (state) {
            case MimeTokenStream.T_BODY:
              System.out.println("Body detected, contents = "
                + stream.getInputStream() + ", header data = "
                + stream.getBodyDescriptor());
              break;
            case MimeTokenStream.T_FIELD:
              System.out.println("Header field detected: "
                + stream.getField());
              break;
            case MimeTokenStream.T_START_MULTIPART:
              System.out.println("Multipart message detexted,"
                + " header data = "
                + stream.getBodyDescriptor());
            ...
          }
      }
 

NOTE: All lines must end with CRLF (\r\n). If you are unsure of the line endings in your stream you should wrap it in a EOLConvertingInputStream instance.

Instances of MimeTokenStream are reusable: Invoking the method parse(InputStream) resets the token streams internal state. However, they are definitely not thread safe. If you have a multi threaded application, then the suggested use is to have one instance per thread.

Version:
$Id: MimeStreamParser.java,v 1.8 2005/02/11 10:12:02 ntherning Exp $

Nested Class Summary
static class MimeTokenStream.Event
          Enumerates events which can be monitored.
 
Field Summary
static int M_NO_RECURSE
          Do not recurse message/rfc822 parts
static int M_RAW
          Parse into raw entities
static int M_RECURSE
          Recursively parse every message/rfc822 part
static int T_BODY
          This token indicates, that an atomic entity is being parsed.
static int T_END_BODYPART
          This token indicates, that the MIME stream is currently at the end of a body part.
static int T_END_HEADER
          This token indicates, that part headers have now been parsed.
static int T_END_MESSAGE
          This token indicates, that the MIME stream is currently at the end of a message.
static int T_END_MULTIPART
          This token indicates, that a multipart body has been parsed.
static int T_END_OF_STREAM
          This token indicates, that the MIME stream has been completely and successfully parsed, and no more data is available.
static int T_EPILOGUE
          This token indicates, that a multiparts epilogue is being parsed.
static int T_FIELD
          This token indicates, that a message parts field has now been parsed.
static int T_PREAMBLE
          This token indicates, that a multiparts preamble is being parsed.
static int T_RAW_ENTITY
          This token indicates, that a raw entity is currently being processed.
static int T_START_BODYPART
          This token indicates, that the MIME stream is currently at the beginning of a body part.
static int T_START_HEADER
          This token indicates, that a message parts headers are now being parsed.
static int T_START_MESSAGE
          This token indicates, that the MIME stream is currently at the beginning of a message.
static int T_START_MULTIPART
          This token indicates, that a multipart body is being parsed.
 
Constructor Summary
MimeTokenStream()
          Constructs a standard (lax) stream.
 
Method Summary
static MimeTokenStream createStrictValidationStream()
          Creates a stream that strictly validates the input.
protected  void debug(MimeTokenStream.Event event)
          Logs (at debug) an indicative message based on the given event and the current state of the system.
 BodyDescriptor getBodyDescriptor()
          Gets a descriptor for the current entity.
 java.lang.String getField()
          This method is valid, if getState() returns T_FIELD.
 java.lang.String getFieldName()
          This method is valid, if getState() returns T_FIELD.
 java.lang.String getFieldValue()
          This method is valid, if getState() returns T_FIELD.
 java.io.InputStream getInputStream()
          This method is valid, if getState() returns either of T_RAW_ENTITY, T_PREAMBLE, or T_EPILOGUE.
 java.io.Reader getReader()
          Gets a reader configured for the current body or body part.
 int getRecursionMode()
          Gets the current recursion mode.
 int getState()
          Returns the current state.
 boolean isRaw()
          Determines if this parser is currently in raw mode.
protected  java.lang.String message(MimeTokenStream.Event event)
          Creates an indicative message suitable for display based on the given event and the current state of the system.
protected  void monitor(MimeTokenStream.Event event)
          Monitors the given event.
protected  BodyDescriptor newBodyDescriptor(BodyDescriptor pParent)
          Creates a new instance of BodyDescriptor.
 int next()
          This method advances the token stream to the next token.
 void parse(java.io.InputStream stream)
          Instructs the to parse the given streams contents.
 void setRaw(boolean raw)
          Deprecated. pass M_RAW to setRecursionMode(int)
 void setRecursionMode(int mode)
          Sets the current recursion.
static java.lang.String stateToString(int state)
          Renders a state as a string suitable for logging.
 void stop()
          Finishes the parsing and stops reading lines.
protected  void warn(MimeTokenStream.Event event)
          Logs (at warn) an indicative message based on the given event and the current state of the system.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

T_END_OF_STREAM

public static final int T_END_OF_STREAM
This token indicates, that the MIME stream has been completely and successfully parsed, and no more data is available.

See Also:
Constant Field Values

T_START_MESSAGE

public static final int T_START_MESSAGE
This token indicates, that the MIME stream is currently at the beginning of a message.

See Also:
Constant Field Values

T_END_MESSAGE

public static final int T_END_MESSAGE
This token indicates, that the MIME stream is currently at the end of a message.

See Also:
Constant Field Values

T_RAW_ENTITY

public static final int T_RAW_ENTITY
This token indicates, that a raw entity is currently being processed. You may call getInputStream() to obtain the raw entity data.

See Also:
Constant Field Values

T_START_HEADER

public static final int T_START_HEADER
This token indicates, that a message parts headers are now being parsed.

See Also:
Constant Field Values

T_FIELD

public static final int T_FIELD
This token indicates, that a message parts field has now been parsed. You may call getField() to obtain the raw field contents.

See Also:
Constant Field Values

T_END_HEADER

public static final int T_END_HEADER
This token indicates, that part headers have now been parsed.

See Also:
Constant Field Values

T_START_MULTIPART

public static final int T_START_MULTIPART
This token indicates, that a multipart body is being parsed.

See Also:
Constant Field Values

T_END_MULTIPART

public static final int T_END_MULTIPART
This token indicates, that a multipart body has been parsed.

See Also:
Constant Field Values

T_PREAMBLE

public static final int T_PREAMBLE
This token indicates, that a multiparts preamble is being parsed. You may call getInputStream() to access the preamble contents.

See Also:
Constant Field Values

T_EPILOGUE

public static final int T_EPILOGUE
This token indicates, that a multiparts epilogue is being parsed. You may call getInputStream() to access the epilogue contents.

See Also:
Constant Field Values

T_START_BODYPART

public static final int T_START_BODYPART
This token indicates, that the MIME stream is currently at the beginning of a body part.

See Also:
Constant Field Values

T_END_BODYPART

public static final int T_END_BODYPART
This token indicates, that the MIME stream is currently at the end of a body part.

See Also:
Constant Field Values

T_BODY

public static final int T_BODY
This token indicates, that an atomic entity is being parsed. Use getInputStream() to access the entity contents.

See Also:
Constant Field Values

M_RECURSE

public static final int M_RECURSE
Recursively parse every message/rfc822 part

See Also:
getRecursionMode(), Constant Field Values

M_NO_RECURSE

public static final int M_NO_RECURSE
Do not recurse message/rfc822 parts

See Also:
getRecursionMode(), Constant Field Values

M_RAW

public static final int M_RAW
Parse into raw entities

See Also:
getRecursionMode(), Constant Field Values
Constructor Detail

MimeTokenStream

public MimeTokenStream()
Constructs a standard (lax) stream. Optional validation events will be logged only. Use createStrictValidationStream() to create a stream that strictly validates the input.

Method Detail

stateToString

public static final java.lang.String stateToString(int state)
Renders a state as a string suitable for logging.

Parameters:
state -
Returns:
rendered as string, not null

createStrictValidationStream

public static final MimeTokenStream createStrictValidationStream()
Creates a stream that strictly validates the input.

Returns:
MimeTokenStream which throws a MimeException whenever possible issues are dedicated in the input

parse

public void parse(java.io.InputStream stream)
Instructs the to parse the given streams contents. If the has already been in use, resets the streams internal state.


isRaw

public boolean isRaw()
Determines if this parser is currently in raw mode.

Returns:
true if in raw mode, false otherwise.
See Also:
setRaw(boolean)

setRaw

public void setRaw(boolean raw)
Deprecated. pass M_RAW to setRecursionMode(int)

Enables or disables raw mode. In raw mode all future entities (messages or body parts) in the stream will be reported to the ContentHandler.raw(InputStream) handler method only. The stream will contain the entire unparsed entity contents including header fields and whatever is in the body.

Parameters:
raw - true enables raw mode, false disables it.

getRecursionMode

public int getRecursionMode()
Gets the current recursion mode. The recursion mode specifies the approach taken to parsing parts. M_RAW mode does not parse the part at all. M_RECURSE mode recursively parses each mail when an message/rfc822 part is encounted; M_NO_RECURSE does not.

Returns:
M_RECURSE, M_RAW or M_NO_RECURSE

setRecursionMode

public void setRecursionMode(int mode)
Sets the current recursion. The recursion mode specifies the approach taken to parsing parts. M_RAW mode does not parse the part at all. M_RECURSE mode recursively parses each mail when an message/rfc822 part is encounted; M_NO_RECURSE does not.

Parameters:
mode - M_RECURSE, M_RAW or M_NO_RECURSE

stop

public void stop()
Finishes the parsing and stops reading lines. NOTE: No more lines will be parsed but the parser will still call ContentHandler.endMultipart(), ContentHandler.endBodyPart(), ContentHandler.endMessage(), etc to match previous calls to ContentHandler.startMultipart(BodyDescriptor), ContentHandler.startBodyPart(), ContentHandler.startMessage(), etc.


getState

public int getState()
Returns the current state.


getField

public java.lang.String getField()
This method is valid, if getState() returns T_FIELD.

Returns:
String with the fields raw contents.
Throws:
java.lang.IllegalStateException - getState() returns another value than T_FIELD.

getFieldName

public java.lang.String getFieldName()
This method is valid, if getState() returns T_FIELD.

Returns:
String with the fields name.
Throws:
java.lang.IllegalStateException - getState() returns another value than T_FIELD.

getFieldValue

public java.lang.String getFieldValue()
This method is valid, if getState() returns T_FIELD.

Returns:
String with the fields value.
Throws:
java.lang.IllegalStateException - getState() returns another value than T_FIELD.

monitor

protected void monitor(MimeTokenStream.Event event)
                throws MimeException,
                       java.io.IOException
Monitors the given event. Subclasses may override to perform actions upon events. Base implementation logs at warn.

Parameters:
event - Event, not null
Throws:
MimeException - subclasses may elect to throw this exception upon invalid content
java.io.IOException - subclasses may elect to throw this exception

message

protected java.lang.String message(MimeTokenStream.Event event)
Creates an indicative message suitable for display based on the given event and the current state of the system.

Parameters:
event - Event, not null
Returns:
message suitable for use as a message in an exception or for logging

warn

protected void warn(MimeTokenStream.Event event)
Logs (at warn) an indicative message based on the given event and the current state of the system.

Parameters:
event - Event, not null

debug

protected void debug(MimeTokenStream.Event event)
Logs (at debug) an indicative message based on the given event and the current state of the system.

Parameters:
event - Event, not null

getInputStream

public java.io.InputStream getInputStream()
This method is valid, if getState() returns either of T_RAW_ENTITY, T_PREAMBLE, or T_EPILOGUE. It returns the raw entity, preamble, or epilogue contents.

Returns:
Data stream, depending on the current state.
Throws:
java.lang.IllegalStateException - getState() returns an invalid value.

getReader

public java.io.Reader getReader()
Gets a reader configured for the current body or body part. The reader will return a transfer and charset decoded stream of characters based on the MIME fields with the standard defaults. This is a conveniance method and relies on getInputStream(). Consult the javadoc for that method for known limitations.

Returns:
Reader, not null
Throws:
java.lang.IllegalStateException - getState() returns an invalid value
UnsupportedCharsetException - if there is no JVM support for decoding the charset
IllegalCharsetNameException - if the charset name specified in the mime type is illegal
See Also:
getInputStream()

getBodyDescriptor

public BodyDescriptor getBodyDescriptor()

Gets a descriptor for the current entity. This method is valid if getState() returns:

Returns:
BodyDescriptor, not nulls

next

public int next()
         throws java.io.IOException,
                MimeException
This method advances the token stream to the next token.

Throws:
java.lang.IllegalStateException - The method has been called, although getState() was already T_END_OF_STREAM.
java.io.IOException
MimeException

newBodyDescriptor

protected BodyDescriptor newBodyDescriptor(BodyDescriptor pParent)
Creates a new instance of BodyDescriptor. Subclasses may override this in order to create body descriptors, that provide more specific information.



Copyright © 2004-2008 The Apache Software Foundation. All Rights Reserved.