File Metadata
Extract and analyze comprehensive file metadata including properties, EXIF data, content analysis, and security information from sandbox files.
File Metadata
Extract comprehensive metadata and properties from files within the sandbox environment including file attributes, content analysis, format-specific metadata, and security information.
📊 Metadata Extraction Capabilities
File metadata extraction provides detailed information about file properties, content structure, format-specific data, and security attributes to support analysis and decision-making.
Overview
The File Metadata tool provides comprehensive metadata extraction capabilities for files within the sandbox environment, supporting detailed analysis of file properties, content characteristics, format-specific information, and security attributes.
Key Features
- Comprehensive Properties - Extract file system attributes, timestamps, and permissions
- Format-Specific Metadata - Extract EXIF, document properties, and media information
- Content Analysis - Analyze file content structure and characteristics
- Security Information - Extract security-related metadata and attributes
- Bulk Processing - Analyze metadata for multiple files simultaneously
Methods
fileMetadata
Extract metadata and properties from files in the sandbox environment.
| Parameter | Type | Required | Description |
|---|---|---|---|
| filePath | String | Yes | Path to the file for metadata extraction |
| extractionLevel | String | No | Extraction depth: 'basic', 'detailed', 'comprehensive' (default: 'detailed') |
| includeContent | Boolean | No | Include content-based analysis (default: false) |
| formatSpecific | Boolean | No | Extract format-specific metadata (default: true) |
| securityScan | Boolean | No | Include security-related metadata (default: false) |
| computeHashes | Boolean | No | Calculate file checksums (default: false) |
{
"filePath": "/sandbox/documents/report.pdf",
"extractionLevel": "comprehensive",
"includeContent": true,
"formatSpecific": true,
"computeHashes": true
}Output:
success(Boolean) - Metadata extraction success statusfilePath(String) - Path to the analyzed filebasicInfo(Object) - Basic file informationfileName(String) - File namefileSize(Number) - File size in bytesfileType(String) - File type/extensionmimeType(String) - MIME type
timestamps(Object) - File timestampscreated(String) - Creation timestampmodified(String) - Last modification timestampaccessed(String) - Last access timestamp
permissions(Object) - File permissions and ownershipformatMetadata(Object) - Format-specific metadatacontentAnalysis(Object) - Content structure analysishashes(Object) - File checksums and hashessecurityInfo(Object) - Security-related information
Basic File Metadata
File System Properties
Format-Specific Metadata
Document and Media Files
Content Analysis and Security
Content Structure Analysis
Bulk Metadata Operations
Batch Analysis
def bulk_metadata_analysis(directory_path, analysis_type="comprehensive"):
"""Perform bulk metadata analysis on directory contents."""
# Get all files in directory
all_files = listFiles({
"path": directory_path,
"recursive": True,
"includeMetadata": True
})
if not all_files['success']:
return {"error": "Cannot access directory"}
bulk_analysis = {
"total_files": 0,
"total_size_mb": 0,
"by_extension": {},
"by_size_category": {"small": 0, "medium": 0, "large": 0, "huge": 0},
"oldest_file": None,
"newest_file": None,
"duplicate_hashes": {},
"processing_errors": []
}
# Process each file
for item in all_files['items']:
if item['type'] == 'file':
bulk_analysis["total_files"] += 1
file_size_mb = item['size'] / (1024*1024)
bulk_analysis["total_size_mb"] += file_size_mb
# Categorize by size
if file_size_mb < 1:
bulk_analysis["by_size_category"]["small"] += 1
elif file_size_mb < 10:
bulk_analysis["by_size_category"]["medium"] += 1
elif file_size_mb < 100:
bulk_analysis["by_size_category"]["large"] += 1
else:
bulk_analysis["by_size_category"]["huge"] += 1
# Track by extension
file_ext = os.path.splitext(item['name'])[1].lower()
if file_ext not in bulk_analysis["by_extension"]:
bulk_analysis["by_extension"][file_ext] = {
"count": 0,
"total_size_mb": 0,
"files": []
}
bulk_analysis["by_extension"][file_ext]["count"] += 1
bulk_analysis["by_extension"][file_ext]["total_size_mb"] += file_size_mb
bulk_analysis["by_extension"][file_ext]["files"].append(item['path'])
# Track oldest and newest
mod_time = item['modified']
if not bulk_analysis["oldest_file"] or mod_time < bulk_analysis["oldest_file"]["modified"]:
bulk_analysis["oldest_file"] = item
if not bulk_analysis["newest_file"] or mod_time > bulk_analysis["newest_file"]["modified"]:
bulk_analysis["newest_file"] = item
# Extract detailed metadata if requested
if analysis_type == "comprehensive":
try:
file_metadata = fileMetadata({
"filePath": item['path'],
"extractionLevel": "detailed",
"computeHashes": True
})
if file_metadata['success']:
# Track potential duplicates by hash
hashes = file_metadata.get('hashes', {})
if 'sha256' in hashes:
hash_value = hashes['sha256']
if hash_value not in bulk_analysis["duplicate_hashes"]:
bulk_analysis["duplicate_hashes"][hash_value] = []
bulk_analysis["duplicate_hashes"][hash_value].append(item['path'])
except Exception as e:
bulk_analysis["processing_errors"].append({
"file": item['path'],
"error": str(e)
})
# Find actual duplicates (more than one file with same hash)
duplicates = {
hash_val: paths for hash_val, paths in bulk_analysis["duplicate_hashes"].items()
if len(paths) > 1
}
bulk_analysis["duplicates_found"] = duplicates
# Generate summary report
print(f"📊 Bulk Metadata Analysis for {directory_path}:")
print(f" Total files: {bulk_analysis['total_files']:,}")
print(f" Total size: {bulk_analysis['total_size_mb']:.2f} MB")
print(f" Processing errors: {len(bulk_analysis['processing_errors'])}")
print(f"\n Size distribution:")
for category, count in bulk_analysis["by_size_category"].items():
percentage = (count / bulk_analysis["total_files"]) * 100 if bulk_analysis["total_files"] > 0 else 0
print(f" {category}: {count} files ({percentage:.1f}%)")
print(f"\n Top file types by count:")
top_extensions = sorted(
bulk_analysis["by_extension"].items(),
key=lambda x: x[1]["count"],
reverse=True
)[:10]
for ext, info in top_extensions:
ext_display = ext if ext else "(no extension)"
print(f" {ext_display}: {info['count']} files ({info['total_size_mb']:.2f} MB)")
if duplicates:
print(f"\n 🔍 Duplicate files found: {len(duplicates)} sets")
for hash_val, paths in list(duplicates.items())[:5]: # Show first 5
print(f" Hash {hash_val[:8]}...: {len(paths)} files")
for path in paths:
print(f" - {path}")
return bulk_analysis
# Usage
bulk_results = bulk_metadata_analysis("/sandbox/data", "comprehensive")Error Handling
Common Metadata Issues
| Error Type | Cause | Resolution |
|---|---|---|
| File Access Error | Cannot read file | Check file permissions and existence |
| Format Not Supported | Unknown or unsupported file format | Use basic extraction level |
| Corrupt Metadata | Damaged file or metadata | Skip format-specific extraction |
| Large File Timeout | File too large for analysis | Use basic level or increase timeout |
| Permission Denied | Insufficient metadata access rights | Check file and directory permissions |
Robust Metadata Extraction
def robust_metadata_extraction(file_path, fallback_levels=['comprehensive', 'detailed', 'basic']):
"""Extract metadata with fallback strategies for problematic files."""
for level in fallback_levels:
try:
result = fileMetadata({
"filePath": file_path,
"extractionLevel": level,
"formatSpecific": level != 'basic',
"includeContent": level == 'comprehensive',
"computeHashes": level in ['comprehensive', 'detailed']
})
if result['success']:
print(f"✅ Metadata extracted using {level} level")
return {
"success": True,
"extraction_level": level,
"metadata": result
}
else:
print(f"⚠️ {level} level failed: {result.get('error')}")
except Exception as e:
print(f"💥 {level} level exception: {str(e)}")
# All levels failed
return {
"success": False,
"error": "All extraction levels failed",
"file_path": file_path
}
# Usage with fallback
robust_result = robust_metadata_extraction("/sandbox/problematic/corrupted_file.pdf")
if robust_result['success']:
print(f"Metadata extracted successfully using {robust_result['extraction_level']} level")
else:
print(f"Failed to extract metadata: {robust_result['error']}")Related Tools
List Files
Browse files to identify candidates for metadata analysis
File Search
Find files with specific metadata characteristics
Read File
Access file contents for detailed content analysis
Next Steps: Use with File Search to find files with specific metadata, or combine with List Files for comprehensive directory analysis.