logo_smallAxellero.io

File Metadata

Extract and analyze comprehensive file metadata including properties, EXIF data, content analysis, and security information from sandbox files.

File Metadata

Extract comprehensive metadata and properties from files within the sandbox environment including file attributes, content analysis, format-specific metadata, and security information.

📊 Metadata Extraction Capabilities

File metadata extraction provides detailed information about file properties, content structure, format-specific data, and security attributes to support analysis and decision-making.

Overview

The File Metadata tool provides comprehensive metadata extraction capabilities for files within the sandbox environment, supporting detailed analysis of file properties, content characteristics, format-specific information, and security attributes.

Key Features

  • Comprehensive Properties - Extract file system attributes, timestamps, and permissions
  • Format-Specific Metadata - Extract EXIF, document properties, and media information
  • Content Analysis - Analyze file content structure and characteristics
  • Security Information - Extract security-related metadata and attributes
  • Bulk Processing - Analyze metadata for multiple files simultaneously

Methods

fileMetadata

Extract metadata and properties from files in the sandbox environment.

ParameterTypeRequiredDescription
filePathStringYesPath to the file for metadata extraction
extractionLevelStringNoExtraction depth: 'basic', 'detailed', 'comprehensive' (default: 'detailed')
includeContentBooleanNoInclude content-based analysis (default: false)
formatSpecificBooleanNoExtract format-specific metadata (default: true)
securityScanBooleanNoInclude security-related metadata (default: false)
computeHashesBooleanNoCalculate file checksums (default: false)
{
  "filePath": "/sandbox/documents/report.pdf",
  "extractionLevel": "comprehensive",
  "includeContent": true,
  "formatSpecific": true,
  "computeHashes": true
}

Output:

  • success (Boolean) - Metadata extraction success status
  • filePath (String) - Path to the analyzed file
  • basicInfo (Object) - Basic file information
    • fileName (String) - File name
    • fileSize (Number) - File size in bytes
    • fileType (String) - File type/extension
    • mimeType (String) - MIME type
  • timestamps (Object) - File timestamps
    • created (String) - Creation timestamp
    • modified (String) - Last modification timestamp
    • accessed (String) - Last access timestamp
  • permissions (Object) - File permissions and ownership
  • formatMetadata (Object) - Format-specific metadata
  • contentAnalysis (Object) - Content structure analysis
  • hashes (Object) - File checksums and hashes
  • securityInfo (Object) - Security-related information

Basic File Metadata

File System Properties

Format-Specific Metadata

Document and Media Files

Content Analysis and Security

Content Structure Analysis

Bulk Metadata Operations

Batch Analysis

def bulk_metadata_analysis(directory_path, analysis_type="comprehensive"):
    """Perform bulk metadata analysis on directory contents."""
    
    # Get all files in directory
    all_files = listFiles({
        "path": directory_path,
        "recursive": True,
        "includeMetadata": True
    })
    
    if not all_files['success']:
        return {"error": "Cannot access directory"}
    
    bulk_analysis = {
        "total_files": 0,
        "total_size_mb": 0,
        "by_extension": {},
        "by_size_category": {"small": 0, "medium": 0, "large": 0, "huge": 0},
        "oldest_file": None,
        "newest_file": None,
        "duplicate_hashes": {},
        "processing_errors": []
    }
    
    # Process each file
    for item in all_files['items']:
        if item['type'] == 'file':
            bulk_analysis["total_files"] += 1
            file_size_mb = item['size'] / (1024*1024)
            bulk_analysis["total_size_mb"] += file_size_mb
            
            # Categorize by size
            if file_size_mb < 1:
                bulk_analysis["by_size_category"]["small"] += 1
            elif file_size_mb < 10:
                bulk_analysis["by_size_category"]["medium"] += 1
            elif file_size_mb < 100:
                bulk_analysis["by_size_category"]["large"] += 1
            else:
                bulk_analysis["by_size_category"]["huge"] += 1
            
            # Track by extension
            file_ext = os.path.splitext(item['name'])[1].lower()
            if file_ext not in bulk_analysis["by_extension"]:
                bulk_analysis["by_extension"][file_ext] = {
                    "count": 0,
                    "total_size_mb": 0,
                    "files": []
                }
            
            bulk_analysis["by_extension"][file_ext]["count"] += 1
            bulk_analysis["by_extension"][file_ext]["total_size_mb"] += file_size_mb
            bulk_analysis["by_extension"][file_ext]["files"].append(item['path'])
            
            # Track oldest and newest
            mod_time = item['modified']
            if not bulk_analysis["oldest_file"] or mod_time < bulk_analysis["oldest_file"]["modified"]:
                bulk_analysis["oldest_file"] = item
            if not bulk_analysis["newest_file"] or mod_time > bulk_analysis["newest_file"]["modified"]:
                bulk_analysis["newest_file"] = item
            
            # Extract detailed metadata if requested
            if analysis_type == "comprehensive":
                try:
                    file_metadata = fileMetadata({
                        "filePath": item['path'],
                        "extractionLevel": "detailed",
                        "computeHashes": True
                    })
                    
                    if file_metadata['success']:
                        # Track potential duplicates by hash
                        hashes = file_metadata.get('hashes', {})
                        if 'sha256' in hashes:
                            hash_value = hashes['sha256']
                            if hash_value not in bulk_analysis["duplicate_hashes"]:
                                bulk_analysis["duplicate_hashes"][hash_value] = []
                            bulk_analysis["duplicate_hashes"][hash_value].append(item['path'])
                    
                except Exception as e:
                    bulk_analysis["processing_errors"].append({
                        "file": item['path'],
                        "error": str(e)
                    })
    
    # Find actual duplicates (more than one file with same hash)
    duplicates = {
        hash_val: paths for hash_val, paths in bulk_analysis["duplicate_hashes"].items()
        if len(paths) > 1
    }
    bulk_analysis["duplicates_found"] = duplicates
    
    # Generate summary report
    print(f"📊 Bulk Metadata Analysis for {directory_path}:")
    print(f"   Total files: {bulk_analysis['total_files']:,}")
    print(f"   Total size: {bulk_analysis['total_size_mb']:.2f} MB")
    print(f"   Processing errors: {len(bulk_analysis['processing_errors'])}")
    
    print(f"\n   Size distribution:")
    for category, count in bulk_analysis["by_size_category"].items():
        percentage = (count / bulk_analysis["total_files"]) * 100 if bulk_analysis["total_files"] > 0 else 0
        print(f"     {category}: {count} files ({percentage:.1f}%)")
    
    print(f"\n   Top file types by count:")
    top_extensions = sorted(
        bulk_analysis["by_extension"].items(),
        key=lambda x: x[1]["count"],
        reverse=True
    )[:10]
    
    for ext, info in top_extensions:
        ext_display = ext if ext else "(no extension)"
        print(f"     {ext_display}: {info['count']} files ({info['total_size_mb']:.2f} MB)")
    
    if duplicates:
        print(f"\n   🔍 Duplicate files found: {len(duplicates)} sets")
        for hash_val, paths in list(duplicates.items())[:5]:  # Show first 5
            print(f"     Hash {hash_val[:8]}...: {len(paths)} files")
            for path in paths:
                print(f"       - {path}")
    
    return bulk_analysis

# Usage
bulk_results = bulk_metadata_analysis("/sandbox/data", "comprehensive")

Error Handling

Common Metadata Issues

Error TypeCauseResolution
File Access ErrorCannot read fileCheck file permissions and existence
Format Not SupportedUnknown or unsupported file formatUse basic extraction level
Corrupt MetadataDamaged file or metadataSkip format-specific extraction
Large File TimeoutFile too large for analysisUse basic level or increase timeout
Permission DeniedInsufficient metadata access rightsCheck file and directory permissions

Robust Metadata Extraction

def robust_metadata_extraction(file_path, fallback_levels=['comprehensive', 'detailed', 'basic']):
    """Extract metadata with fallback strategies for problematic files."""
    
    for level in fallback_levels:
        try:
            result = fileMetadata({
                "filePath": file_path,
                "extractionLevel": level,
                "formatSpecific": level != 'basic',
                "includeContent": level == 'comprehensive',
                "computeHashes": level in ['comprehensive', 'detailed']
            })
            
            if result['success']:
                print(f"✅ Metadata extracted using {level} level")
                return {
                    "success": True,
                    "extraction_level": level,
                    "metadata": result
                }
            else:
                print(f"⚠️ {level} level failed: {result.get('error')}")
        
        except Exception as e:
            print(f"💥 {level} level exception: {str(e)}")
    
    # All levels failed
    return {
        "success": False,
        "error": "All extraction levels failed",
        "file_path": file_path
    }

# Usage with fallback
robust_result = robust_metadata_extraction("/sandbox/problematic/corrupted_file.pdf")
if robust_result['success']:
    print(f"Metadata extracted successfully using {robust_result['extraction_level']} level")
else:
    print(f"Failed to extract metadata: {robust_result['error']}")

Next Steps: Use with File Search to find files with specific metadata, or combine with List Files for comprehensive directory analysis.